Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next 6/6] r8169: support RTL8168G
From: Hayes Wang @ 2012-07-10  5:36 UTC (permalink / raw)
  To: romieu; +Cc: netdev, linux-kernel, wfg, Hayes Wang
In-Reply-To: <c558386b836ee97762e12495101c6e373f20e69d.1341872752.git.romieu@fr.zoreil.com>

fix incorrct argument in rtl_hw_init_8168g.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/ethernet/realtek/r8169.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 7ff3423..c29c5fb 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6753,14 +6753,14 @@ static void __devinit rtl_hw_init_8168g(struct rtl8169_private *tp)
 	msleep(1);
 	RTL_W8(MCU, RTL_R8(MCU) & ~NOW_IS_OOB);
 
-	data = r8168_mac_ocp_read(ioaddr, 0xe8de);
+	data = r8168_mac_ocp_read(tp, 0xe8de);
 	data &= ~(1 << 14);
 	r8168_mac_ocp_write(tp, 0xe8de, data);
 
 	if (!rtl_udelay_loop_wait_high(tp, &rtl_link_list_ready_cond, 100, 42))
 		return;
 
-	data = r8168_mac_ocp_read(ioaddr, 0xe8de);
+	data = r8168_mac_ocp_read(tp, 0xe8de);
 	data |= (1 << 15);
 	r8168_mac_ocp_write(tp, 0xe8de, data);
 
-- 
1.7.10.4

^ permalink raw reply related

* [RFC] skbtrace: A trace infrastructure for networking subsystem
From: Li Yu @ 2012-07-10  6:07 UTC (permalink / raw)
  To: Linux Netdev List

Hi,

  This RFC introduces to the tracing infrastructure for networking
subsystem and a workable prototype.

  I noticed that the blktrace indeed helps file system and block
subsystem developers a lot, even it could help them to find out some
problems in mm subsystem. However, the "networkers" don't have such
like good luck, although tcpdump is very very useful, but they still
often need to start investigation from limited exported statistics
counters, then may directly dig into source code to guess possible
solutions, then test their ideas, if good luck doesn't arrive, then
start another investigation-guess-test loop. It is a difficult
time-costly and hard to share experiences, report problem, many users
have not enough understanding for protocol stack internals, I saw some
"detailed reports" still do not carry useful information to solve problem.

  Unfortunately, the networking subsystem is rather performance
sensitive in kernel, so we can not add too detailed counters directly
here. In fact, Some folks already tried to add more statistics counters
for detailed performance measuration, e.g. RFC4898 and its
implementation Web10g project. Web10G is a great project for
researchers and engineers on TCP stack, which exports per-connection
details to userland by procfs or netlink interface. However, it tightly
depends on TCP and its implementation, other protocols implementation
need some duplicated works to archive same goal, and it also has some
measurable overhead (5% - 10% in my simple netperf TCP_STREAM
benchmark), I think it'd better that such powerful tracing or
instrumentation feature should be able to be off at runtime, and zero
overhead when it is off.

  So why we don't write a blktrace like utility for our sweet
networking subsystem? This just is it, "skbtrace", I hope it can:

1. Provide an extendable tracing infrastructure to support various
protocols instead of specific one.

2. Ability of runtime enable or disable and zero overhead when it
is off. I think that jump label optimized trace point is a good choice
to implement it.

3. Provide tracing details on per-connection/per-skb level. Please note
that skbtrace are not only for sk_buff tracing, but also can track
sockets events. Second, this also means we need some forms of filters,
otherwise we must will lost in tons of uninteresting trace data. I think
that BPF is one of good choices. But we need extend BPF to make it
can handle other data structures rather than skb.

   Above is my basic idea, below are details of current prototype
implementation.

   Like blktrace, skbtrace also are base on the tracepoints
infrastructure and relay file system, however, I do not implement any
tracers like blktrace, since I want to keep kernel side as simple (also
fast, I hope) as possible. Basically, the trace points just are
optimized conditional statements here, the slow path copies these
traced data to the ring buffer in relay file system. The parameters of
this relay file system can be tuned by some exported files in skbtrace
directory.

  There are three trace data files (channels) in relay file system for
each CPU, they represent above ring buffers that save kernel traced
data for different contexts respectively:

  (1) trace.hardirq.cpuN, saving trace data that come from hardirq
context.
  (2) trace.softirq.cpuN, saving trace data that come from softirq
context.
  (3) trace.syscall.cpuN, saving trace data that come from process
context.

  Each trace data will write into one of above channels, depend on which
context is trace point called. Each trace data is represented by a
skbtrace_block struct, the extended fields for specific protocols can be
append at end of it. For global order of trace data, this patch has an
64 bits atomic variable to generate sequence number of each generated
trace data. So userland utility is able to sort out of order trace data
across different channels or/and CPUs.

  For tracing filter feature, I selected BPF as core engine, so far, it
only can filter out sk_buff-based traces, I have a plan to extend BPF to
support other data structures. In fact, I ever wrote a custom filter
implemenation for TCP/IPv4 ago, this way needs to refactor each specific
protocol implemenation, I do not like and discard them.

  So far, I implemented some skbtrace trace points:

  (1) skb_rps_info.

         I ever saw that some buggy drivers (or firmwares?)
         always setup zero skb->rx_hash. And it seem that RPS
         hashing can not work well for some corner cases.

  (2) tcp_connection and icsk_connection.

	To track the basic TCP state migration, e.g. TCP_LISTEN.

  (3) tcp_sendlimit.

        Personally, I am interesting in reason of tcp_write_xmit()
        exits.

  (4) tcp_congestion.

       Oops, it is cwnd killer, isn't it?

  The userland utilties:

  (1) skbtrace, record raw trace data to regular disk files.
  (2) skbparse, parse raw trace data to human readable strings.
                this still need a lot of works, it just is a rough
		(but workable) demo for TCP/IPv4 yet.

  You can get source code at github:

	https://github.com/Rover-Yu/skbtrace-userland
	https://github.com/Rover-Yu/skbtrace-kernel

  The source code of skbtrace-kernel is based on net-next tree.

  Welcome for suggestions.

  Thanks.

Yu

^ permalink raw reply

* [net] net: Fix memory leak - vlan_info struct
From: Jeff Kirsher @ 2012-07-10  6:47 UTC (permalink / raw)
  To: davem; +Cc: Amir Hanania, netdev, gospo, sassmann, Jeff Kirsher

From: Amir Hanania <amir.hanania@intel.com>

In driver reload test there is a memory leak.
The structure vlan_info was not freed when the driver was removed.
It was not released since the nr_vids var is one after last vlan was removed.
The nr_vids is one, since vlan zero is added to the interface when the interface
is being set, but the vlan zero is not deleted at unregister.
Fix - delete vlan zero when we unregister the device.

Signed-off-by: Amir Hanania <amir.hanania@intel.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 net/8021q/vlan.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 6089f0c..9096bcb 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -403,6 +403,9 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event,
 		break;
 
 	case NETDEV_DOWN:
+		if (dev->features & NETIF_F_HW_VLAN_FILTER)
+			vlan_vid_del(dev, 0);
+
 		/* Put all VLANs for this dev in the down state too.  */
 		for (i = 0; i < VLAN_N_VID; i++) {
 			vlandev = vlan_group_get_device(grp, i);
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH net-next 6/6] r8169: support RTL8168G
From: Francois Romieu @ 2012-07-10  6:50 UTC (permalink / raw)
  To: Hayes Wang; +Cc: David S. Miller, netdev, linux-kernel, wfg
In-Reply-To: <1341898590-1253-1-git-send-email-hayeswang@realtek.com>

Hayes Wang <hayeswang@realtek.com> :
> fix incorrct argument in rtl_hw_init_8168g.
> 
> Signed-off-by: Hayes Wang <hayeswang@realtek.com>

Thanks Hayes.

It's available with proper attribution and subject at:

git://violet.fr.zoreil.com/romieu/linux davem-next.r8169

-- 
Ueimor

^ permalink raw reply

* [v2 PATCH] ksz884x: fix Endian
From: roy.qing.li @ 2012-07-10  6:56 UTC (permalink / raw)
  To: netdev; +Cc: Tristram.Ha, bhutchings, joe

From: Li RongQing <roy.qing.li@gmail.com>

ETH_P_IP is host Endian, skb->protocol is big Endian, when
compare them, Using htons on skb->protocol is wrong.

And fix two code style issues: indentation and remove
unnecessary parentheses.

CC: Tristram Ha <Tristram.Ha@micrel.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
CC: Joe Perches <joe@perches.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
---
 drivers/net/ethernet/micrel/ksz884x.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/micrel/ksz884x.c b/drivers/net/ethernet/micrel/ksz884x.c
index eaf9ff0..0fbe2e2 100644
--- a/drivers/net/ethernet/micrel/ksz884x.c
+++ b/drivers/net/ethernet/micrel/ksz884x.c
@@ -4881,8 +4881,8 @@ static netdev_tx_t netdev_tx(struct sk_buff *skb, struct net_device *dev)
 	left = hw_alloc_pkt(hw, skb->len, num);
 	if (left) {
 		if (left < num ||
-				((CHECKSUM_PARTIAL == skb->ip_summed) &&
-				(ETH_P_IPV6 == htons(skb->protocol)))) {
+		    (CHECKSUM_PARTIAL == skb->ip_summed &&
+		     skb->protocol == htons(ETH_P_IPV6))) {
 			struct sk_buff *org_skb = skb;
 
 			skb = netdev_alloc_skb(dev, org_skb->len);
-- 
1.7.1

^ permalink raw reply related

* Re: [RFC PATCH] bridge: netfilter: fix skb->nf_bridge NULL panic in br_nf_forward_finish
From: Massimo Cetra @ 2012-07-10  6:58 UTC (permalink / raw)
  To: Lin Ming
  Cc: Massimo Cetra, Eric Dumazet, netdev, Stephen Hemminger,
	David S. Miller, Julian Anastasov
In-Reply-To: <CAF1ivSZBMWYc5iKxhX5d_ykkMD4LauFP9M10dBwfmqvpYj=pHg@mail.gmail.com>

On 09/07/2012 14:00, Lin Ming wrote:

>> i spent a couple of days trying to figure out how to reproduce but you were
>> quicker and smarter than me.
>
> Could you also test it ? :-)
>

Of course.

I have already installed a 3.5-rc and a 3.2.22 with this patch and, by 
now, i see no problems.

I'm only waiting a couple of days before reporting, to be sure the issue 
is gone.

Massimo

^ permalink raw reply

* Re: [PATCH net-next 6/6] r8169: support RTL8168G
From: Hayes Wang @ 2012-07-10  7:12 UTC (permalink / raw)
  To: romieu; +Cc: netdev, linux-kernel, Hayes Wang
In-Reply-To: <1341898590-1253-1-git-send-email-hayeswang@realtek.com>

1. Remove rtl_ocpdr_cond. No waiting is needed for mac_ocp_{write / read}.
2. Set ocp_base to OCP_STD_PHY_BASE after rtl8168g_1_hw_phy_config.
---
 drivers/net/ethernet/realtek/r8169.c |   14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index c29c5fb..7269175 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -1043,13 +1043,6 @@ static void rtl_w1w0_phy_ocp(struct rtl8169_private *tp, int reg, int p, int m)
 	r8168_phy_ocp_write(tp, reg, (val | p) & ~m);
 }
 
-DECLARE_RTL_COND(rtl_ocpdr_cond)
-{
-	void __iomem *ioaddr = tp->mmio_addr;
-
-	return RTL_R32(OCPDR) & OCPAR_FLAG;
-}
-
 static void r8168_mac_ocp_write(struct rtl8169_private *tp, u32 reg, u32 data)
 {
 	void __iomem *ioaddr = tp->mmio_addr;
@@ -1058,8 +1051,6 @@ static void r8168_mac_ocp_write(struct rtl8169_private *tp, u32 reg, u32 data)
 		return;
 
 	RTL_W32(OCPDR, OCPAR_FLAG | (reg << 15) | data);
-
-	rtl_udelay_loop_wait_low(tp, &rtl_ocpdr_cond, 25, 10);
 }
 
 static u16 r8168_mac_ocp_read(struct rtl8169_private *tp, u32 reg)
@@ -1071,8 +1062,7 @@ static u16 r8168_mac_ocp_read(struct rtl8169_private *tp, u32 reg)
 
 	RTL_W32(OCPDR, reg << 15);
 
-	return rtl_udelay_loop_wait_high(tp, &rtl_ocpdr_cond, 25, 10) ?
-		RTL_R32(OCPDR) : ~0;
+	return RTL_R32(OCPDR);
 }
 
 #define OCP_STD_PHY_BASE	0xa400
@@ -3417,6 +3407,8 @@ static void rtl8168g_1_hw_phy_config(struct rtl8169_private *tp)
 	rtl_w1w0_phy_ocp(tp, 0xa438, 0x8000, 0x0000);
 
 	rtl_w1w0_phy_ocp(tp, 0xc422, 0x4000, 0x2000);
+
+	rtl_writephy(tp, 0x1f, 0x0000);
 }
 
 static void rtl8102e_hw_phy_config(struct rtl8169_private *tp)
-- 
1.7.10.4

^ permalink raw reply related

* Re: TCP transmit performance regression
From: Ming Lei @ 2012-07-10  7:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Network Development, David Miller
In-Reply-To: <1341895143.3265.4049.camel@edumazet-glaptop>

On Tue, Jul 10, 2012 at 12:39 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Please dont send private messages for discussing general linux stuff.
>
> Next time I wont reply.
>
> On Tue, 2012-07-10 at 12:00 +0800, Ming Lei wrote:
>> On Mon, Jul 9, 2012 at 9:54 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Mon, 2012-07-09 at 21:23 +0800, Ming Lei wrote:
>> >
>> >> Looks the patch replaces skb_clone with netdev_alloc_skb_ip_align and
>> >> introduces extra copies on incoming data, so would you mind explaining
>> >> it in a bit detail? And why is skb_clone not OK for the purpose?
>> >
>> > Problem with cloning is that some paths will have to make a private copy
>> > of the skb.
>>
>> Looks you convert some private copy into all copy in rx path, :-)
>
> For small speed device, a copy is probably unnoticed.

The copy still has some effect on low speed device, for example, your recent
patch on asix driver can improve tx performance from ~75M to ~92M.

>
> rtl8169 does that (copybreak) for security issues on Gbps link speed,
> and I get Gbps link speed on an old AMD host with no problem.
>
> As you discovered, the slowdown comes from SLAB debug on the 30K huge
> skb. To recover from this we must patch usbnet to not constantly
> allocate/free such big RX skb but recycle them. Once we do that, you'll
> find out that copybreak improves general performance on low ram devices
> by an order of magnitude.

Looks your copybreak patch doesn't improve tx performance on smsc95xx.

>> >
>> > So you dont see the cost here in the driver, but later in upper stacks.
>> >
>> > Since this driver defaults to a huge RX area of more than 16Kbytes,
>> > a copy to a much smaller skb (we call this 'copybreak' in our jargon )
>> > is more than welcome to avoid OOM problems anyway.
>>
>> Looks 'memory compaction' has been implemented already to address
>> the big buffer allocation problem.
>
> Usually its too late (not enough ram to perform the compaction), and
> a collapse having to compact 3MB is very expensive and blows cpu caches.
>
> I noticed that on machines with 1GB or 2GB ram. These machines are
> called ChromeBooks and every lost network frame is analyzed in Google.
> And we had problems because some wifi adapters use 8KB skbs for incoming
> frames.

Kernel stack size is 8KB or more, so could you find process creation failure
in your ChromeBooks machine at the same time?

> (Not even 32KB !!! This is just crazy !!)
>
> Relying on TCP collapsing is just very lazy. What about other
> protocols ?
>
> I guess that on beagle this can happen very fast.

Previously I only found there was usbnet OOMs triggered by
kmalloc(GFP_ATOMIC), but kmalloc(GFP_KERNEL) can succeed.
Some times later, the problem disappeared.

>>
>> Also the allocated huge RX SKB buffer will be freed after all cloned buffers
>> are consumed, so I still don't know what is the real problem with cloned buffer.
>>
>
> IF they are consumed.
>
> But IF they arent because application is not fast enough to drain, you
> end with sockets storing huge amount of data in their receive buffer.
>
> So a single 100 bytes payload holds the 32KB block.
>
> If you allowed your UDP socket to store 130.000 bytes of payload, you
> can consume 13.000 * 32KB = ~40 MB

Looks it is one advantage of copybreak.

>
>
>> >
>> > TCP coalescing (skb_try_coalesce) for example wont work for cloned skbs,
>> > so TCP receive window will close pretty fast, and performance sucks in
>> > lossy environments (like the Internet)
>>
>> I didn't observe the above thing, so could you provide a way to reproduce it?
>>
>
> netstat -s can show you interesting TCP counters. But as driver lies on
> skb->truesize, you can also have unexpected crashes with malicious
> senders. With a 64 ratio, its easy to consume all ram.
>
> TCP coalescing is great as soon as you have Out Of Order queueing
> because of packet losses. You avoid expensive collapses and
> dropping/purge of OFO queue. Sender has to resend previously sent data.
>
>> Suppose the above is true, looks skb_clone is useless, isn't it?
>
> cloning has some uses, for example if you dont need to touch packet
> content, only mess with skb->data, skb->len, skb->tail.
>
> But if you need to change a single bit in the payload, or play with skb
> fragments (struct skb_shared_info), you have to make a full copy of the
> 30KB buffer, even if the skb contained only 10 bytes of payload.

So the netdev_alloc_skb_ip_align() can be replaced with skb_clone()
in asix driver since not bits are touched in asix_rx_fixup? The default MTU is
1500 and rx_urb_size is 2048.

If so, could we use copybreak only for case of rx_urb_size > 4096?
And for ax88172, the dev->rx_urb_size is always 2048, looks the copy
is not needed at all.

> I would just switch off turbo mode by default, I doubt it has any
> advantage.

At least for smsc95xx, I think 32K buffer is not worthy of the feature.

>
> Coalescing up to 16K of incoming frames adds latency for no performance
> gain, once you do it the right way (that is without OOM risks).
> Currently, skb->truesize lie is very bad.
>



Thanks,
-- 
Ming Lei

^ permalink raw reply

* Re: linux-next: build failure after merge of the net-next tree
From: Bjørn Mork @ 2012-07-10  7:25 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: David Miller, netdev, linux-next, linux-kernel
In-Reply-To: <20120710130848.1014fbe05e5146a33a3c7d39@canb.auug.org.au>

Stephen Rothwell <sfr@canb.auug.org.au> writes:

> Hi all,
>
> After merging the net-next tree, today's linux-next build (x86_64
> allmodconfig) failed like this:
>
> drivers/net/usb/qmi_wwan.c:381:13: error: 'qmi_wwan_unbind_shared' undeclared here (not in a function)
>
> Caused by a bad automatic merge between commit 6fecd35d4cd7 ("net:
> qmi_wwan: add ZTE MF60") from the net tree and commit 230718bda1be ("net:
> qmi_wwan: bind to both control and data interface") from the net-next
> tree.
>
> I added the following merge fix patch:
>
> From: Stephen Rothwell <sfr@canb.auug.org.au>
> Date: Tue, 10 Jul 2012 13:06:01 +1000
> Subject: [PATCH] net: fix for qmi_wwan_unbind_shared changes
>
> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
> ---
>  drivers/net/usb/qmi_wwan.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
> index 06cfcc7..85c983d 100644
> --- a/drivers/net/usb/qmi_wwan.c
> +++ b/drivers/net/usb/qmi_wwan.c
> @@ -378,7 +378,7 @@ static const struct driver_info qmi_wwan_force_int2 = {
>  	.description	= "Qualcomm WWAN/QMI device",
>  	.flags		= FLAG_WWAN,
>  	.bind		= qmi_wwan_bind_shared,
> -	.unbind		= qmi_wwan_unbind_shared,
> +	.unbind		= qmi_wwan_unbind,
>  	.manage_power	= qmi_wwan_manage_power,
>  	.data		= BIT(2), /* interface whitelist bitmap */
>  };


Looks good.  Thanks.


Bjørn

^ permalink raw reply

* net-next kernel NULL pointer dereference at fib_rules_tclass
From: Or Gerlitz @ 2012-07-10  7:16 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Shlomo Pongratz, Amir Vadai, Erez Shitrit

Hi Dave,

Using latest net-next (061a5c316b6526dbc729049a16243ec27937cc31) I
get the below crash during the boot cycle. The crash happens on a set of
nodes which use igb for their onboard 1g nic, as soon as the device goes
up. Another group, that uses a 2nd lab, where the nodes use bnx2 for 1g
NIC doesn't get this crash, but the kernel there is built by a different
.config .

Or.

Bringing up loopback interface:  [  OK  ]
Bringing up interface eth1:
Determining IP information for eth1...IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
Starting system logger: BUG: unable to handle kernel NULL pointer dereference at 00000000000000ac
IP: [<ffffffff81320393>] fib_rules_tclass+0xf/0x17
PGD 223171067 PUD 22353e067 PMD 0
Oops: 0000 [#1] SMP
CPU 0
Modules linked in:
 ipv6 dm_mirror dm_region_hash dm_log uinput igb ptp pps_core mlx4_ib ib_mad ib_core mlx4_en mlx4_core sg kvm_intel kvm microcode pcspkr rng_core ioatdma dca shpchp dm_mod button sr_mod ext3 jbd sd_mod usb_storage ata_piix libata scsi_mod ehci_hcd uhci_hcd floppy [last unloaded: scsi_wait_scan]

Pid: 0, comm: swapper/0 Not tainted 3.5.0-rc5-12540-g061a5c3-dirty #94 Supermicro X7DWU/X7DWU
RIP: 0010:[<ffffffff81320393>]  [<ffffffff81320393>] fib_rules_tclass+0xf/0x17
RSP: 0018:ffff88022fc03a30  EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff88022fc03b54 RCX: 0000000000000050
RDX: 0000000000000020 RSI: 0000000000000001 RDI: ffff88022fc03a40
RBP: ffff88022fc03a30 R08: ffff88022fc03a70 R09: ffff88022fc03a40
R10: 0000000000000020 R11: ffff880225390a80 R12: 0000000000000001
R13: ffff88021cc7a000 R14: 0000000000000000 R15: ffff8802269c26c0
FS:  0000000000000000(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000000ac CR3: 0000000222aeb000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper/0 (pid: 0, threadinfo ffffffff81600000, task ffffffff81613410)
Stack:
 ffff88022fc03ac0 ffffffff81318956 ffff8802fd010010 ffff8802232d5a80
 ffff880222add880 ffff880223269a98 0000000000000020 ffff880200000000
 0000000100000000 ffff000000000000 12311eac2540eaf0 ffff88027e001eac
Call Trace:
 <IRQ>

 [<ffffffff81318956>] fib_validate_source+0x170/0x2a5
 [<ffffffff812e6603>] ip_route_input_common+0x6fe/0xd12
 [<ffffffff812e8380>] ? ip_rcv_finish+0x70/0x457
 [<ffffffff812e8461>] ip_rcv_finish+0x151/0x457
 [<ffffffff812e8380>] ? ip_rcv_finish+0x70/0x457
 [<ffffffff812e89a1>] ip_rcv+0x23a/0x260
 [<ffffffff812beae7>] __netif_receive_skb+0x3ac/0x415
 [<ffffffff812be86f>] ? __netif_receive_skb+0x134/0x415
 [<ffffffff81312ae5>] ? inet_gro_receive+0x81/0x23f
 [<ffffffff812b68da>] ? skb_free_head+0x47/0x49
 [<ffffffff812c035d>] netif_receive_skb+0xee/0xf7
 [<ffffffff812c071d>] ? dev_gro_receive+0x15f/0x2fb
 [<ffffffff812c063a>] ? dev_gro_receive+0x7c/0x2fb
 [<ffffffff81065644>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff812c044c>] napi_skb_finish+0x24/0x56
 [<ffffffff812c0bf0>] napi_gro_receive+0x10f/0x11e
 [<ffffffffa0216e85>] igb_poll+0x843/0xae5 [igb]
 [<ffffffff812c0e01>] ? net_rx_action+0x14c/0x1ee
 [<ffffffff812c0d76>] net_rx_action+0xc1/0x1ee
 [<ffffffff8102f746>] __do_softirq+0xff/0x1de
 [<ffffffff813631cc>] call_softirq+0x1c/0x26
 [<ffffffff81003090>] do_softirq+0x38/0x80
 [<ffffffff8102f41f>] irq_exit+0x4e/0x83
 [<ffffffff810028f9>] do_IRQ+0x98/0xaf
 [<ffffffff8135b52c>] common_interrupt+0x6c/0x6c
 <EOI>

 [<ffffffff810083ec>] ? mwait_idle+0x13c/0x208
 [<ffffffff810083e3>] ? mwait_idle+0x133/0x208
 [<ffffffff810088d1>] cpu_idle+0x6e/0xab
 [<ffffffff81343e13>] rest_init+0xc7/0xce
 [<ffffffff81343d4c>] ? csum_partial_copy_generic+0x16c/0x16c
 [<ffffffff8167fbf3>] start_kernel+0x332/0x33f
 [<ffffffff8167f6f6>] ? kernel_init+0x19d/0x19d
 [<ffffffff8167f2b4>] x86_64_start_reservations+0xb8/0xbd
 [<ffffffff8167f3a6>] x86_64_start_kernel+0xed/0xf4
Code: 81 31 c0 e8 a5 bb dd ff 48 83 c4 28 31 c0 5b 41 5c 41 5d 41 5e 41 5f c9 c3 90 90 90 48 8b 57 20 55 31 c0 48 89 e5 48 85 d2 74 06 <8b> 82 8c 00 00 00 c9 c3 8b 47 7c 33 46 14 85 87 80 00 00 00 55
RIP  [<ffffffff81320393>] fib_rules_tclass+0xf/0x17
 RSP <ffff88022fc03a30>
CR2: 00000000000000ac
---[ end trace e7c6714b8de1c341 ]---
Kernel panic - not syncing: Fatal exception in interrupt

^ permalink raw reply

* net-next kernel NULL pointer dereference at fib_rules_tclass
From: Or Gerlitz @ 2012-07-10  7:29 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, Amir Vadai, Shlomo Pongratz, Erez Shitrit

Hi Dave,

Using latest net-next (061a5c316b6526dbc729049a16243ec27937cc31) I
get the below crash during the boot cycle. The crash happens on a set of
nodes which use igb for their onboard 1g nic, as soon as the device goes
up. Another group, that uses a 2nd lab, where the nodes use bnx2 for 1g
NIC doesn't get this crash, but the kernel there is built by a different
.config

Or.


Bringing up loopback interface:  [  OK  ]
Bringing up interface eth1:
Determining IP information for eth1...IPv6: ADDRCONF(NETDEV_UP): eth1:
link is not ready
igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
Starting system logger: BUG: unable to handle kernel NULL pointer
dereference at 00000000000000ac
IP: [<ffffffff81320393>] fib_rules_tclass+0xf/0x17
PGD 223171067 PUD 22353e067 PMD 0
Oops: 0000 [#1] SMP
CPU 0
Modules linked in:
  ipv6 dm_mirror dm_region_hash dm_log uinput igb ptp pps_core mlx4_ib
ib_mad ib_core mlx4_en mlx4_core sg kvm_intel kvm microcode pcspkr
rng_core ioatdma dca shpchp dm_mod button sr_mod ext3 jbd sd_mod
usb_storage ata_piix libata scsi_mod ehci_hcd uhci_hcd floppy [last
unloaded: scsi_wait_scan]

Pid: 0, comm: swapper/0 Not tainted 3.5.0-rc5-12540-g061a5c3-dirty #94
Supermicro X7DWU/X7DWU
RIP: 0010:[<ffffffff81320393>]  [<ffffffff81320393>]
fib_rules_tclass+0xf/0x17
RSP: 0018:ffff88022fc03a30  EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff88022fc03b54 RCX: 0000000000000050
RDX: 0000000000000020 RSI: 0000000000000001 RDI: ffff88022fc03a40
RBP: ffff88022fc03a30 R08: ffff88022fc03a70 R09: ffff88022fc03a40
R10: 0000000000000020 R11: ffff880225390a80 R12: 0000000000000001
R13: ffff88021cc7a000 R14: 0000000000000000 R15: ffff8802269c26c0
FS:  0000000000000000(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000000ac CR3: 0000000222aeb000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper/0 (pid: 0, threadinfo ffffffff81600000, task
ffffffff81613410)
Stack:
  ffff88022fc03ac0 ffffffff81318956 ffff8802fd010010 ffff8802232d5a80
  ffff880222add880 ffff880223269a98 0000000000000020 ffff880200000000
  0000000100000000 ffff000000000000 12311eac2540eaf0 ffff88027e001eac
Call Trace:
  <IRQ>

  [<ffffffff81318956>] fib_validate_source+0x170/0x2a5
  [<ffffffff812e6603>] ip_route_input_common+0x6fe/0xd12
  [<ffffffff812e8380>] ? ip_rcv_finish+0x70/0x457
  [<ffffffff812e8461>] ip_rcv_finish+0x151/0x457
  [<ffffffff812e8380>] ? ip_rcv_finish+0x70/0x457
  [<ffffffff812e89a1>] ip_rcv+0x23a/0x260
  [<ffffffff812beae7>] __netif_receive_skb+0x3ac/0x415
  [<ffffffff812be86f>] ? __netif_receive_skb+0x134/0x415
  [<ffffffff81312ae5>] ? inet_gro_receive+0x81/0x23f
  [<ffffffff812b68da>] ? skb_free_head+0x47/0x49
  [<ffffffff812c035d>] netif_receive_skb+0xee/0xf7
[<ffffffff812c071d>] ? dev_gro_receive+0x15f/0x2fb
  [<ffffffff812c063a>] ? dev_gro_receive+0x7c/0x2fb
  [<ffffffff81065644>] ? trace_hardirqs_on+0xd/0xf
  [<ffffffff812c044c>] napi_skb_finish+0x24/0x56
  [<ffffffff812c0bf0>] napi_gro_receive+0x10f/0x11e
  [<ffffffffa0216e85>] igb_poll+0x843/0xae5 [igb]
  [<ffffffff812c0e01>] ? net_rx_action+0x14c/0x1ee
  [<ffffffff812c0d76>] net_rx_action+0xc1/0x1ee
  [<ffffffff8102f746>] __do_softirq+0xff/0x1de
  [<ffffffff813631cc>] call_softirq+0x1c/0x26
  [<ffffffff81003090>] do_softirq+0x38/0x80
  [<ffffffff8102f41f>] irq_exit+0x4e/0x83
  [<ffffffff810028f9>] do_IRQ+0x98/0xaf
  [<ffffffff8135b52c>] common_interrupt+0x6c/0x6c
  <EOI>

  [<ffffffff810083ec>] ? mwait_idle+0x13c/0x208
  [<ffffffff810083e3>] ? mwait_idle+0x133/0x208
  [<ffffffff810088d1>] cpu_idle+0x6e/0xab
  [<ffffffff81343e13>] rest_init+0xc7/0xce
  [<ffffffff81343d4c>] ? csum_partial_copy_generic+0x16c/0x16c
  [<ffffffff8167fbf3>] start_kernel+0x332/0x33f
  [<ffffffff8167f6f6>] ? kernel_init+0x19d/0x19d
  [<ffffffff8167f2b4>] x86_64_start_reservations+0xb8/0xbd
  [<ffffffff8167f3a6>] x86_64_start_kernel+0xed/0xf4
Code: 81 31 c0 e8 a5 bb dd ff 48 83 c4 28 31 c0 5b 41 5c 41 5d 41 5e 41
5f c9 c3 90 90 90 48 8b 57 20 55 31 c0 48 89 e5 48 85 d2 74 06 <8b> 82
8c 00 00 00 c9 c3 8b 47 7c 33 46 14 85 87 80 00 00 00 55
RIP  [<ffffffff81320393>] fib_rules_tclass+0xf/0x17
  RSP <ffff88022fc03a30>
CR2: 00000000000000ac
---[ end trace e7c6714b8de1c341 ]---
Kernel panic - not syncing: Fatal exception in interrupt

^ permalink raw reply

* Re: 82571EB: Detected Hardware Unit Hang
From: Joe Jin @ 2012-07-10  7:40 UTC (permalink / raw)
  To: Joe Jin; +Cc: e1000-devel, netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <4FFA9B96.6040901@oracle.com>

When I debug the driver I found before Detected HW hang, driver unable to clean
and reclaim the resources:

1457         while ((eop_desc->upper.data & cpu_to_le32(E1000_TXD_STAT_DD)) &&  <== at here upper.data always is 0x300
1458                (count < tx_ring->count)) {
     <--- snip --->
1487         }


I checked all driver codes I did not found anywhere will set the upper.data with 
E1000_TXD_STAT_DD, I guess upper.data be set by hardware?
If OS is 32bit system, what which happen?

Thanks in advance,
Joe 

On 07/09/12 16:51, Joe Jin wrote:
> Hi list,
> 
> I'm seeing a Unit Hang even with the latest e1000e driver 2.0.0 when doing
> scp test. this issue is easy do reproduced on SUN FIRE X2270 M2, just copy
> a big file (>500M) from another server will hit it at once. 
> 
> Would you please help on this?
> 
> device info:
> # lspci -s 05:00.0 
> 05:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) (rev 06)
> 
> # lspci -s 05:00.0 -n
> 05:00.0 0200: 8086:10bc (rev 06)
> 
> # ethtool -i eth0
> driver: e1000e
> version: 2.0.0-NAPI
> firmware-version: 5.10-2
> bus-info: 0000:05:00.0
> 
> # ethtool -k eth0
> Offload parameters for eth0:
> rx-checksumming: on
> tx-checksumming: on
> scatter-gather: on
> tcp segmentation offload: on
> udp fragmentation offload: off
> generic segmentation offload: on
> generic-receive-offload: on
> 
> kernel log:
> -----------
> e1000e 0000:05:00.0: eth0: Detected Hardware Unit Hang:
>   TDH                  <6c>
>   TDT                  <81>
>   next_to_use          <81>
>   next_to_clean        <6b>
> buffer_info[next_to_clean]:
>   time_stamp           <fffc7a23>
>   next_to_watch        <71>
>   jiffies              <fffc8c0c>
>   next_to_watch.status <0>
> MAC Status             <80387>
> PHY Status             <792d>
> PHY 1000BASE-T Status  <3c00>
> PHY Extended Status    <3000>
> PCI Status             <10>
> e1000e 0000:05:00.0: eth0: Detected Hardware Unit Hang:
>   TDH                  <6c>
>   TDT                  <81>
>   next_to_use          <81>
>   next_to_clean        <6b>
> buffer_info[next_to_clean]:
>   time_stamp           <fffc7a23>
>   next_to_watch        <71>
>   jiffies              <fffc9bac>
>   next_to_watch.status <0>
> MAC Status             <80387>
> PHY Status             <792d>
> PHY 1000BASE-T Status  <3c00>
> PHY Extended Status    <3000>
> PCI Status             <10>
> ------------[ cut here ]------------
> WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x225/0x230()
> Hardware name: SUN FIRE X2270 M2
> NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
> Modules linked in: autofs4 hidp rfcomm bluetooth rfkill lockd sunrpc cpufreq_ondemand acpi_cpufreq mperf be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi video sbs sbshc acpi_pad acpi_ipmi ipmi_msghandler parport_pc lp parport e1000e(U) snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device igb snd_pcm_oss serio_raw snd_mixer_oss snd_pcm tpm_infineon snd_timer snd soundcore snd_page_alloc i2c_i801 iTCO_wdt i2c_core pcspkr i7core_edac iTCO_vendor_support ioatdma ghes dca edac_core hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage sd_mod crc_t10dif sg ahci libahci ext3 jbd mbcache [last unloaded: microcode]
> Pid: 0, comm: swapper Not tainted 2.6.39-200.24.1.el5uek #1
> Call Trace:
>  [<c07d9ac5>] ? dev_watchdog+0x225/0x230
>  [<c045ba61>] warn_slowpath_common+0x81/0xa0
>  [<c07d9ac5>] ? dev_watchdog+0x225/0x230
>  [<c045bb23>] warn_slowpath_fmt+0x33/0x40
>  [<c07d9ac5>] dev_watchdog+0x225/0x230
>  [<c07d98a0>] ? dev_activate+0xb0/0xb0
>  [<c0468e82>] call_timer_fn+0x32/0xf0
>  [<c04bceb0>] ? rcu_check_callbacks+0x80/0x80
>  [<c046a76d>] run_timer_softirq+0xed/0x1b0
>  [<c07d98a0>] ? dev_activate+0xb0/0xb0
>  [<c0461a81>] __do_softirq+0x91/0x1a0
>  [<c04619f0>] ? local_bh_enable+0x80/0x80
>  <IRQ>  [<c0462295>] ? irq_exit+0x95/0xa0
>  [<c087f8b8>] ? smp_apic_timer_interrupt+0x38/0x42
>  [<c08784f5>] ? apic_timer_interrupt+0x31/0x38
>  [<c046007b>] ? do_exit+0x11b/0x370
>  [<c065eae4>] ? intel_idle+0xa4/0x100
>  [<c078d9b9>] ? cpuidle_idle_call+0xb9/0x1e0
>  [<c0411d77>] ? cpu_idle+0x97/0xd0
>  [<c085cbbd>] ? rest_init+0x5d/0x70
>  [<c0b07a7a>] ? start_kernel+0x28a/0x340
>  [<c0b074b0>] ? obsolete_checksetup+0xb0/0xb0
>  [<c0b070a4>] ? i386_start_kernel+0x64/0xb0
> ---[ end trace 5502b55cd4d4e5cb ]---
> e1000e 0000:05:00.0: eth0: Reset adapter
> e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> 
> Thanks,
> Joe
> 


-- 
Oracle <http://www.oracle.com>
Joe Jin | Software Development Senior Manager | +8610.6106.5624
ORACLE | Linux and Virtualization
No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing 

^ permalink raw reply

* [v2 PATCH] qlge: fix endian issue
From: roy.qing.li @ 2012-07-10  8:02 UTC (permalink / raw)
  To: netdev

From: Li RongQing <roy.qing.li@gmail.com>

commit 6d29b1ef introduces a bug, ntohs is __be16_to_cpu,
not cpu_to_be16.

We always use htons on IP_OFFSET and IP_MF, then compare
with network package.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
---
v2 : Change my name
 drivers/net/ethernet/qlogic/qlge/qlge_main.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_main.c b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
index 09d8d33..7c520fa 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_main.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
@@ -1546,7 +1546,7 @@ static void ql_process_mac_rx_page(struct ql_adapter *qdev,
 			struct iphdr *iph =
 				(struct iphdr *) ((u8 *)addr + ETH_HLEN);
 			if (!(iph->frag_off &
-				cpu_to_be16(IP_MF|IP_OFFSET))) {
+				htons(IP_MF|IP_OFFSET))) {
 				skb->ip_summed = CHECKSUM_UNNECESSARY;
 				netif_printk(qdev, rx_status, KERN_DEBUG,
 					     qdev->ndev,
@@ -1654,7 +1654,7 @@ static void ql_process_mac_rx_skb(struct ql_adapter *qdev,
 			/* Unfragmented ipv4 UDP frame. */
 			struct iphdr *iph = (struct iphdr *) skb->data;
 			if (!(iph->frag_off &
-				ntohs(IP_MF|IP_OFFSET))) {
+				htons(IP_MF|IP_OFFSET))) {
 				skb->ip_summed = CHECKSUM_UNNECESSARY;
 				netif_printk(qdev, rx_status, KERN_DEBUG,
 					     qdev->ndev,
@@ -1968,7 +1968,7 @@ static void ql_process_mac_split_rx_intr(struct ql_adapter *qdev,
 		/* Unfragmented ipv4 UDP frame. */
 			struct iphdr *iph = (struct iphdr *) skb->data;
 			if (!(iph->frag_off &
-				ntohs(IP_MF|IP_OFFSET))) {
+				htons(IP_MF|IP_OFFSET))) {
 				skb->ip_summed = CHECKSUM_UNNECESSARY;
 				netif_printk(qdev, rx_status, KERN_DEBUG, qdev->ndev,
 					     "TCP checksum done!\n");
-- 
1.7.1

^ permalink raw reply related

* Re: TCP transmit performance regression
From: Eric Dumazet @ 2012-07-10  8:28 UTC (permalink / raw)
  To: Ming Lei; +Cc: Network Development, David Miller
In-Reply-To: <CACVXFVPgqtSN3BrEXRxSv4yxaxCni495SxZNXBmYQpagmxk2tQ@mail.gmail.com>

On Tue, 2012-07-10 at 15:22 +0800, Ming Lei wrote:

> Kernel stack size is 8KB or more, so could you find process creation failure
> in your ChromeBooks machine at the same time?

I believe you mix a lot of things.

Have you ever heard of sockets limits ?

All available ram on a machine is not for whoever wants it, thanks God.

No : TCP stack was dropping frames, because of socket limits.

Only because skbs were fat (8KB allocated/truesize, for a single 1500
bytes frame)

If application is fast and read skb as soon as the arrive, no problem is
detected.

But if  application is slow, or a TCP packet is lost on network,
man packets are queued into ofo queue. And eventually not enough room is
avalable -> we drop incoming frames, and sender has to restransmit them.

So instead of loading your web pages as fast as possible, you have to
wait for retransmits.

So you see nothing at all, no kernel logs, no failed memory attempts.

Only its slower than necessary

^ permalink raw reply

* Re: [RFC PATCH] bridge: netfilter: fix skb->nf_bridge NULL panic in br_nf_forward_finish
From: Lin Ming @ 2012-07-10  8:34 UTC (permalink / raw)
  To: Massimo Cetra
  Cc: Massimo Cetra, Eric Dumazet, netdev, Stephen Hemminger,
	David S. Miller, Julian Anastasov
In-Reply-To: <4FFBD289.7050909@navynet.it>

On Tue, Jul 10, 2012 at 2:58 PM, Massimo Cetra <mcetra@navynet.it> wrote:
> On 09/07/2012 14:00, Lin Ming wrote:
>
>>> i spent a couple of days trying to figure out how to reproduce but you
>>> were
>>> quicker and smarter than me.
>>
>>
>> Could you also test it ? :-)
>>
>
> Of course.
>
> I have already installed a 3.5-rc and a 3.2.22 with this patch and, by now,
> i see no problems.
>
> I'm only waiting a couple of days before reporting, to be sure the issue is
> gone.

Then could you reply to below thread after you confirm the issue is gone?

http://marc.info/?l=linux-netdev&m=134165707424765&w=2

Nice to add your "Reported-and-tested-by:".

Thanks,
Lin Ming

>
>
> Massimo

^ permalink raw reply

* Re: [PATCH v2] bridge: netfilter: fix skb->nf_bridge NULL panic in br_nf_forward_finish
From: Simon Horman @ 2012-07-10  8:41 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: Lin Ming, Massimo Cetra, Eric Dumazet, netdev, Stephen Hemminger,
	David S. Miller
In-Reply-To: <alpine.LFD.2.00.1207071322490.5927@ja.ssi.bg>

On Sat, Jul 07, 2012 at 01:27:49PM +0300, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Sat, 7 Jul 2012, Lin Ming wrote:
> 
> > On Sat, 2012-07-07 at 12:48 +0300, Julian Anastasov wrote:
> > > 
> > > 	Very good. Thanks for tracking and fixing this bug.
> > > Can you send a copy to Simon Horman <horms@verge.net.au>
> > > with correct Subject. As this change can go to stable
> > > kernels you can also improve the comments, for example:
> > > 
> > > ipvs: fix oops on NAT reply in br_nf context
> > > 
> > > 	IPVS should not reset skb->nf_bridge in FORWARD hook
> > > by calling nf_reset for NAT replies. It triggers oops in
> > > br_nf_forward_finish.
> > > 
> > > [here follows your corrected description including
> > > the stack trace]
> > 
> > How about below? Can I have your ACK?
> > I'll resend this patch in another mail.
> 
> 	Very good. You can add my
> 
> Signed-off-by: Julian Anastasov <ja@ssi.bg>

Thanks, I will queue this up in my ipvs tree and see
about getting it included in 3.5

It seems to me that this problem has been present since 2.6.37
and thus is stable material.

^ permalink raw reply

* Re: net-next kernel NULL pointer dereference at fib_rules_tclass
From: Lin Ming @ 2012-07-10  8:42 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: David Miller, netdev, Shlomo Pongratz, Amir Vadai, Erez Shitrit
In-Reply-To: <alpine.LRH.2.00.1207101008270.9760@ogerlitz.voltaire.com>

On Tue, Jul 10, 2012 at 3:16 PM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
> Hi Dave,
>
> Using latest net-next (061a5c316b6526dbc729049a16243ec27937cc31) I
> get the below crash during the boot cycle. The crash happens on a set of
> nodes which use igb for their onboard 1g nic, as soon as the device goes
> up. Another group, that uses a 2nd lab, where the nodes use bnx2 for 1g
> NIC doesn't get this crash, but the kernel there is built by a different
> .config .

Hi,

I got similar panic, but not at boot time.
I'll look for the cause.

Regards,
Lin Ming

>
> Or.
>
> Bringing up loopback interface:  [  OK  ]
> Bringing up interface eth1:
> Determining IP information for eth1...IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
> igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
> IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
> Starting system logger: BUG: unable to handle kernel NULL pointer dereference at 00000000000000ac
> IP: [<ffffffff81320393>] fib_rules_tclass+0xf/0x17
> PGD 223171067 PUD 22353e067 PMD 0
> Oops: 0000 [#1] SMP
> CPU 0
> Modules linked in:
>  ipv6 dm_mirror dm_region_hash dm_log uinput igb ptp pps_core mlx4_ib ib_mad ib_core mlx4_en mlx4_core sg kvm_intel kvm microcode pcspkr rng_core ioatdma dca shpchp dm_mod button sr_mod ext3 jbd sd_mod usb_storage ata_piix libata scsi_mod ehci_hcd uhci_hcd floppy [last unloaded: scsi_wait_scan]
>
> Pid: 0, comm: swapper/0 Not tainted 3.5.0-rc5-12540-g061a5c3-dirty #94 Supermicro X7DWU/X7DWU
> RIP: 0010:[<ffffffff81320393>]  [<ffffffff81320393>] fib_rules_tclass+0xf/0x17
> RSP: 0018:ffff88022fc03a30  EFLAGS: 00010202
> RAX: 0000000000000000 RBX: ffff88022fc03b54 RCX: 0000000000000050
> RDX: 0000000000000020 RSI: 0000000000000001 RDI: ffff88022fc03a40
> RBP: ffff88022fc03a30 R08: ffff88022fc03a70 R09: ffff88022fc03a40
> R10: 0000000000000020 R11: ffff880225390a80 R12: 0000000000000001
> R13: ffff88021cc7a000 R14: 0000000000000000 R15: ffff8802269c26c0
> FS:  0000000000000000(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00000000000000ac CR3: 0000000222aeb000 CR4: 00000000000007f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process swapper/0 (pid: 0, threadinfo ffffffff81600000, task ffffffff81613410)
> Stack:
>  ffff88022fc03ac0 ffffffff81318956 ffff8802fd010010 ffff8802232d5a80
>  ffff880222add880 ffff880223269a98 0000000000000020 ffff880200000000
>  0000000100000000 ffff000000000000 12311eac2540eaf0 ffff88027e001eac
> Call Trace:
>  <IRQ>
>
>  [<ffffffff81318956>] fib_validate_source+0x170/0x2a5
>  [<ffffffff812e6603>] ip_route_input_common+0x6fe/0xd12
>  [<ffffffff812e8380>] ? ip_rcv_finish+0x70/0x457
>  [<ffffffff812e8461>] ip_rcv_finish+0x151/0x457
>  [<ffffffff812e8380>] ? ip_rcv_finish+0x70/0x457
>  [<ffffffff812e89a1>] ip_rcv+0x23a/0x260
>  [<ffffffff812beae7>] __netif_receive_skb+0x3ac/0x415
>  [<ffffffff812be86f>] ? __netif_receive_skb+0x134/0x415
>  [<ffffffff81312ae5>] ? inet_gro_receive+0x81/0x23f
>  [<ffffffff812b68da>] ? skb_free_head+0x47/0x49
>  [<ffffffff812c035d>] netif_receive_skb+0xee/0xf7
>  [<ffffffff812c071d>] ? dev_gro_receive+0x15f/0x2fb
>  [<ffffffff812c063a>] ? dev_gro_receive+0x7c/0x2fb
>  [<ffffffff81065644>] ? trace_hardirqs_on+0xd/0xf
>  [<ffffffff812c044c>] napi_skb_finish+0x24/0x56
>  [<ffffffff812c0bf0>] napi_gro_receive+0x10f/0x11e
>  [<ffffffffa0216e85>] igb_poll+0x843/0xae5 [igb]
>  [<ffffffff812c0e01>] ? net_rx_action+0x14c/0x1ee
>  [<ffffffff812c0d76>] net_rx_action+0xc1/0x1ee
>  [<ffffffff8102f746>] __do_softirq+0xff/0x1de
>  [<ffffffff813631cc>] call_softirq+0x1c/0x26
>  [<ffffffff81003090>] do_softirq+0x38/0x80
>  [<ffffffff8102f41f>] irq_exit+0x4e/0x83
>  [<ffffffff810028f9>] do_IRQ+0x98/0xaf
>  [<ffffffff8135b52c>] common_interrupt+0x6c/0x6c
>  <EOI>
>
>  [<ffffffff810083ec>] ? mwait_idle+0x13c/0x208
>  [<ffffffff810083e3>] ? mwait_idle+0x133/0x208
>  [<ffffffff810088d1>] cpu_idle+0x6e/0xab
>  [<ffffffff81343e13>] rest_init+0xc7/0xce
>  [<ffffffff81343d4c>] ? csum_partial_copy_generic+0x16c/0x16c
>  [<ffffffff8167fbf3>] start_kernel+0x332/0x33f
>  [<ffffffff8167f6f6>] ? kernel_init+0x19d/0x19d
>  [<ffffffff8167f2b4>] x86_64_start_reservations+0xb8/0xbd
>  [<ffffffff8167f3a6>] x86_64_start_kernel+0xed/0xf4
> Code: 81 31 c0 e8 a5 bb dd ff 48 83 c4 28 31 c0 5b 41 5c 41 5d 41 5e 41 5f c9 c3 90 90 90 48 8b 57 20 55 31 c0 48 89 e5 48 85 d2 74 06 <8b> 82 8c 00 00 00 c9 c3 8b 47 7c 33 46 14 85 87 80 00 00 00 55
> RIP  [<ffffffff81320393>] fib_rules_tclass+0xf/0x17
>  RSP <ffff88022fc03a30>
> CR2: 00000000000000ac
> ---[ end trace e7c6714b8de1c341 ]---
> Kernel panic - not syncing: Fatal exception in interrupt

^ permalink raw reply

* Re: [PATCH] ipvs: fix oops on NAT reply in br_nf context
From: Simon Horman @ 2012-07-10  8:51 UTC (permalink / raw)
  To: Lin Ming
  Cc: Julian Anastasov, Massimo Cetra, Eric Dumazet, David S. Miller,
	netdev
In-Reply-To: <1341656770.8543.3.camel@chief-river-32>

On Sat, Jul 07, 2012 at 06:26:10PM +0800, Lin Ming wrote:
> IPVS should not reset skb->nf_bridge in FORWARD hook
> by calling nf_reset for NAT replies. It triggers oops in
> br_nf_forward_finish.
> 
> [  579.781508] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
> [  579.781669] IP: [<ffffffff817b1ca5>] br_nf_forward_finish+0x58/0x112
> [  579.781792] PGD 218f9067 PUD 0 
> [  579.781865] Oops: 0000 [#1] SMP 
> [  579.781945] CPU 0 
> [  579.781983] Modules linked in:
> [  579.782047] 
> [  579.782080] 
> [  579.782114] Pid: 4644, comm: qemu Tainted: G        W    3.5.0-rc5-00006-g95e69f9 #282 Hewlett-Packard  /30E8
> [  579.782300] RIP: 0010:[<ffffffff817b1ca5>]  [<ffffffff817b1ca5>] br_nf_forward_finish+0x58/0x112
> [  579.782455] RSP: 0018:ffff88007b003a98  EFLAGS: 00010287
> [  579.782541] RAX: 0000000000000008 RBX: ffff8800762ead00 RCX: 000000000001670a
> [  579.782653] RDX: 0000000000000000 RSI: 000000000000000a RDI: ffff8800762ead00
> [  579.782845] RBP: ffff88007b003ac8 R08: 0000000000016630 R09: ffff88007b003a90
> [  579.782957] R10: ffff88007b0038e8 R11: ffff88002da37540 R12: ffff88002da01a02
> [  579.783066] R13: ffff88002da01a80 R14: ffff88002d83c000 R15: ffff88002d82a000
> [  579.783177] FS:  0000000000000000(0000) GS:ffff88007b000000(0063) knlGS:00000000f62d1b70
> [  579.783306] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
> [  579.783395] CR2: 0000000000000004 CR3: 00000000218fe000 CR4: 00000000000027f0
> [  579.783505] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  579.783684] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  579.783795] Process qemu (pid: 4644, threadinfo ffff880021b20000, task ffff880021aba760)
> [  579.783919] Stack:
> [  579.783959]  ffff88007693cedc ffff8800762ead00 ffff88002da01a02 ffff8800762ead00
> [  579.784110]  ffff88002da01a02 ffff88002da01a80 ffff88007b003b18 ffffffff817b26c7
> [  579.784260]  ffff880080000000 ffffffff81ef59f0 ffff8800762ead00 ffffffff81ef58b0
> [  579.784477] Call Trace:
> [  579.784523]  <IRQ> 
> [  579.784562] 
> [  579.784603]  [<ffffffff817b26c7>] br_nf_forward_ip+0x275/0x2c8
> [  579.784707]  [<ffffffff81704b58>] nf_iterate+0x47/0x7d
> [  579.784797]  [<ffffffff817ac32e>] ? br_dev_queue_push_xmit+0xae/0xae
> [  579.784906]  [<ffffffff81704bfb>] nf_hook_slow+0x6d/0x102
> [  579.784995]  [<ffffffff817ac32e>] ? br_dev_queue_push_xmit+0xae/0xae
> [  579.785175]  [<ffffffff8187fa95>] ? _raw_write_unlock_bh+0x19/0x1b
> [  579.785179]  [<ffffffff817ac417>] __br_forward+0x97/0xa2
> [  579.785179]  [<ffffffff817ad366>] br_handle_frame_finish+0x1a6/0x257
> [  579.785179]  [<ffffffff817b2386>] br_nf_pre_routing_finish+0x26d/0x2cb
> [  579.785179]  [<ffffffff817b2cf0>] br_nf_pre_routing+0x55d/0x5c1
> [  579.785179]  [<ffffffff81704b58>] nf_iterate+0x47/0x7d
> [  579.785179]  [<ffffffff817ad1c0>] ? br_handle_local_finish+0x44/0x44
> [  579.785179]  [<ffffffff81704bfb>] nf_hook_slow+0x6d/0x102
> [  579.785179]  [<ffffffff817ad1c0>] ? br_handle_local_finish+0x44/0x44
> [  579.785179]  [<ffffffff81551525>] ? sky2_poll+0xb35/0xb54
> [  579.785179]  [<ffffffff817ad62a>] br_handle_frame+0x213/0x229
> [  579.785179]  [<ffffffff817ad417>] ? br_handle_frame_finish+0x257/0x257
> [  579.785179]  [<ffffffff816e3b47>] __netif_receive_skb+0x2b4/0x3f1
> [  579.785179]  [<ffffffff816e69fc>] process_backlog+0x99/0x1e2
> [  579.785179]  [<ffffffff816e6800>] net_rx_action+0xdf/0x242
> [  579.785179]  [<ffffffff8107e8a8>] __do_softirq+0xc1/0x1e0
> [  579.785179]  [<ffffffff8135a5ba>] ? trace_hardirqs_off_thunk+0x3a/0x6c
> [  579.785179]  [<ffffffff8188812c>] call_softirq+0x1c/0x30
> 
> The steps to reproduce as follow,
> 
> 1. On Host1, setup brige br0(192.168.1.106)
> 2. Boot a kvm guest(192.168.1.105) on Host1 and start httpd
> 3. Start IPVS service on Host1
>    ipvsadm -A -t 192.168.1.106:80 -s rr
>    ipvsadm -a -t 192.168.1.106:80 -r 192.168.1.105:80 -m
> 4. Run apache benchmark on Host2(192.168.1.101)
>    ab -n 1000 http://192.168.1.106/
> 
> ip_vs_reply4
>   ip_vs_out
>     handle_response
>       ip_vs_notrack
>         nf_reset()
>         {
>           skb->nf_bridge = NULL;
>         }
> 
> Actually, IPVS wants in this case just to replace nfct
> with untracked version. So replace the nf_reset(skb) call
> in ip_vs_notrack() with a nf_conntrack_put(skb->nfct) call.
> 
> Signed-off-by: Lin Ming <mlin@ss.pku.edu.cn>
> Signed-off-by: Julian Anastasov <ja@ssi.bg>

Actually, I'll queue up this version for 3.5 rather than the previous one
as it has a better title.

As per my previous comment (repeated here for reference) it seems to me
that this problem has been present since 2.6.37 and thus is stable material.

^ permalink raw reply

* Re: [PATCH] net: cgroup: fix access the unallocated memory in netprio cgroup
From: Gao feng @ 2012-07-10  8:53 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: nhorman, davem, linux-kernel, netdev, lizefan, tj, Eric Dumazet
In-Reply-To: <1341893650.3265.3974.camel@edumazet-glaptop>

> Hi Gao
> 
> Is it still needed to call update_netdev_tables() from write_priomap() ?
> 

Yes, I think it's needed,because read_priomap will show all of the net devices,

But we may add the netdev after create a netprio cgroup, so the new added netdev's
priomap will not be allocated. if we don't call update_netdev_tables in write_priomap,
we may access this unallocated memory.

^ permalink raw reply

* [PATCH] tc: filter: validate filter priority in userspace.
From: Li Wei @ 2012-07-10  8:45 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

Because we use the high 16 bits of tcm_info to pass prio value to
kernel, thus it's range would be [0, 0xffff], without validation
in tc when user pass a lager(>65535) priority, the actual priority
set in kernel would confuse the user.

So, add a validation to ensure prio in the range.
---
 tc/tc_filter.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tc/tc_filter.c b/tc/tc_filter.c
index 207302f..04c3b82 100644
--- a/tc/tc_filter.c
+++ b/tc/tc_filter.c
@@ -105,7 +105,7 @@ int tc_filter_modify(int cmd, unsigned flags, int argc, char **argv)
 			NEXT_ARG();
 			if (prio)
 				duparg("priority", *argv);
-			if (get_u32(&prio, *argv, 0))
+			if (get_u32(&prio, *argv, 0) || prio > 0xFFFF)
 				invarg(*argv, "invalid priority value");
 		} else if (matches(*argv, "protocol") == 0) {
 			__u16 id;
-- 
1.7.1

^ permalink raw reply related

* Re: net-next kernel NULL pointer dereference at fib_rules_tclass
From: David Miller @ 2012-07-10  9:00 UTC (permalink / raw)
  To: mlin; +Cc: ogerlitz, netdev, shlomop, amirv, erezsh
In-Reply-To: <CAF1ivSbw50US9dPxs63C8_hjdBq6K6_He7_Foi5bW1MvefunHw@mail.gmail.com>

From: Lin Ming <mlin@ss.pku.edu.cn>
Date: Tue, 10 Jul 2012 16:42:29 +0800

> On Tue, Jul 10, 2012 at 3:16 PM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
>> Hi Dave,
>>
>> Using latest net-next (061a5c316b6526dbc729049a16243ec27937cc31) I
>> get the below crash during the boot cycle. The crash happens on a set of
>> nodes which use igb for their onboard 1g nic, as soon as the device goes
>> up. Another group, that uses a 2nd lab, where the nodes use bnx2 for 1g
>> NIC doesn't get this crash, but the kernel there is built by a different
>> .config .
> 
> Hi,
> 
> I got similar panic, but not at boot time.
> I'll look for the cause.

Don't worry about it, I am sure that I added this bug and therefore
I will fix it.

^ permalink raw reply

* Re: [PATCH net-next 6/6] r8169: support RTL8168G
From: Francois Romieu @ 2012-07-10  9:00 UTC (permalink / raw)
  To: Hayes Wang; +Cc: netdev, linux-kernel
In-Reply-To: <1341904369-5277-1-git-send-email-hayeswang@realtek.com>

(you should include a Signed-off-by)

Hayes Wang <hayeswang@realtek.com> :
> 1. Remove rtl_ocpdr_cond. No waiting is needed for mac_ocp_{write / read}.

Nit: it would not hurt to do a better job than me and save some commit noise
getting these things right before they pollute the history. :o)

> 2. Set ocp_base to OCP_STD_PHY_BASE after rtl8168g_1_hw_phy_config.

Can't it be stuffed into the firmware ?

The code does not explicitely switch from the PHY access context to
the extra OCP registers one and anything else in rtl8168g_1_hw_phy_config
seems to directly use the addresses it needs. So I'd expect the current
imbalance to come from the firmware, where it would make as much sense to
fix it
-> no imbalance after the firmware is applied
-> no useless instruction if the firmware is not used

-- 
Ueimor

^ permalink raw reply

* Re: [RFC PATCH net-next] ipvs: add missing lock in ip_vs_ftp_init_conn()
From: Simon Horman @ 2012-07-10  9:05 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: Xiaotian Feng, netdev, lvs-devel, netfilter-devel, netfilter,
	coreteam, linux-kernel, Xiaotian Feng, Wensong Zhang,
	Pablo Neira Ayuso, Patrick McHardy, David S. Miller
In-Reply-To: <alpine.LFD.2.00.1207030952340.1749@ja.ssi.bg>

On Tue, Jul 03, 2012 at 10:12:41AM +0300, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Thu, 28 Jun 2012, Xiaotian Feng wrote:
> 
> > We met a kernel panic in 2.6.32.43 kernel:
> > 
> > [2680191.848044] IPVS: ip_vs_conn_hash(): request for already hashed, called from run_timer_softirq+0x175/0x1d0
> > <snip>
> > [2680311.849009] general protection fault: 0000 [#1] SMP
> > [2680311.853001] RIP: 0010:[<ffffffff815f155c>]  [<ffffffff815f155c>] ip_vs_conn_expire+0xdc/0x2f0
> > [2680311.853001] RSP: 0018:ffff880028303e70  EFLAGS: 00010202
> > [2680311.853001] RAX: dead000000200200 RBX: ffff8801aad00b80 RCX: 0000000000001d90
> > [2680311.853001] RDX: dead000000100100 RSI: 000000004fd59800 RDI: ffff8801aad00c08
> > <snip>
> > [2680311.853001] Call Trace:
> > [2680311.853001]  <IRQ>
> > [2680311.853001]  [<ffffffff815f1480>] ? ip_vs_conn_expire+0x0/0x2f0
> > [2680311.853001]  [<ffffffff8104e2a5>] run_timer_softirq+0x175/0x1d0
> > [2680311.853001]  [<ffffffff81021a48>] ? lapic_next_event+0x18/0x20
> > [2680311.853001]  [<ffffffff81049a13>] __do_softirq+0xb3/0x150
> > [2680311.853001]  [<ffffffff8100cc5c>] call_softirq+0x1c/0x30
> > [2680311.853001]  [<ffffffff8100ea9a>] do_softirq+0x4a/0x80
> > [2680311.853001]  [<ffffffff81049957>] irq_exit+0x77/0x80
> > [2680311.853001]  [<ffffffff81021f2c>] smp_apic_timer_interrupt+0x6c/0xa0
> > [2680311.853001]  [<ffffffff8100c633>] apic_timer_interrupt+0x13/0x20
> > [2680311.853001]  <EOI>
> > [2680311.853001]  [<ffffffff81013b52>] ? mwait_idle+0x52/0x70
> > [2680311.853001]  [<ffffffff8100a7b0>] ? enter_idle+0x20/0x30
> > [2680311.853001]  [<ffffffff8100ac62>] ? cpu_idle+0x52/0x80
> > [2680311.853001]  [<ffffffff816d504d>] ? start_secondary+0x19d/0x280
> > 
> > rax and rdx is LIST_POISON1 and LIST_POISON2, so kernel is list_del() on an already deleted
> > connection and result the general protect fault.
> > 
> > The "request for already hashed" warning, told us someone might change the connection flags
> > incorrectly, like described in commit aea9d711, it changes the connection flags, but doesn't
> > put the connection back to the list. So ip_vs_conn_hash() throw a warning and return.
> > Later, when ip_vs_conn_expire fire again, ip_vs_conn_unhash() will find the HASHED connection
> > and list_del() it, then kernel panic happened.
> > 
> > After code review, the only chance that kernel change connection flag without protection is
> > in ip_vs_ftp_init_conn().
> > 
> > Signed-off-by: Xiaotian Feng <dannyfeng@tencent.com>
> > Cc: Wensong Zhang <wensong@linux-vs.org>
> > Cc: Simon Horman <horms@verge.net.au>
> > Cc: Julian Anastasov <ja@ssi.bg>
> > Cc: Pablo Neira Ayuso <pablo@netfilter.org>
> > Cc: Patrick McHardy <kaber@trash.net>
> > Cc: "David S. Miller" <davem@davemloft.net> 
> 
> 	For the fix below:
> 
> Acked-by: Julian Anastasov <ja@ssi.bg>
> 
> 	Simon, the change looks ok. ip_vs_ftp_init_conn is called
> from context where cp->lock is not locked (no double lock), so it
> should be safe for the backup.
> 
> 	Only that the comment is not specifying that we
> fix a problem in the backup server.

Thanks.

I have pushed this to my ipvs branch and will see about getting it included in 3.5.

It appears that this problem has been present since (at least) 2.6.37 and
my feeling is that it is -stable material.


^ permalink raw reply

* Re: [PATCH] net: cgroup: fix access the unallocated memory in netprio cgroup
From: Eric Dumazet @ 2012-07-10  9:15 UTC (permalink / raw)
  To: Gao feng; +Cc: nhorman, davem, linux-kernel, netdev, lizefan, tj, Eric Dumazet
In-Reply-To: <4FFBED84.1030905@cn.fujitsu.com>

On Tue, 2012-07-10 at 16:53 +0800, Gao feng wrote:
> > Hi Gao
> > 
> > Is it still needed to call update_netdev_tables() from write_priomap() ?
> > 
> 
> Yes, I think it's needed,because read_priomap will show all of the net devices,
> 
> But we may add the netdev after create a netprio cgroup, so the new added netdev's
> priomap will not be allocated. if we don't call update_netdev_tables in write_priomap,
> we may access this unallocated memory.
> 

I realize my question was not clear.

If we write in write_priomap() a field of a single netdevice,
why should we allocate memory for all netdevices on the machine ?

So the question was : Do we really need to call
update_netdev_tables(alldevs), instead of extend_netdev_table(dev)

^ permalink raw reply

* Re: [GIT PULL net] IPVS
From: Simon Horman @ 2012-07-10  9:20 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Hans Schillstrom, Jesper Dangaard Brouer
In-Reply-To: <20120430092722.GA6866@1984>

On Mon, Apr 30, 2012 at 11:27:22AM +0200, Pablo Neira Ayuso wrote:
> On Fri, Apr 27, 2012 at 09:53:54AM +0900, Simon Horman wrote:
> > Hi Pablo,
> > 
> > please consider the following 5 changes for 3.4, they are all bug fixes.
> > I would also like these changes considered for stable.
> 
> Please, ping me again once these have hit Linus tree to ask for
> -stable submission.

Sorry for letting this slip through the cracks.

Please consider the following commits which are in Linus's tree for stable.
Or I can submit them directly if that is easier.

There are 7 patches listed below. The first 5 were the patches in this
pull request. The last two were patches in a git pull request
a few days earlier.


commit 8537de8a7ab6681cc72fb0411ab1ba7fdba62dd0
Author: Hans Schillstrom <hans.schillstrom@ericsson.com>
Date:   Thu Apr 26 07:47:44 2012 +0200

    ipvs: kernel oops - do_ip_vs_get_ctl
    
    Change order of init so netns init is ready
    when register ioctl and netlink.
    
    Ver2
    	Whitespace fixes and __init added.
    
    Reported-by: "Ryan O'Hara" <rohara@redhat.com>
    Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
    Acked-by: Julian Anastasov <ja@ssi.bg>
    Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Signed-off-by: Simon Horman <horms@verge.net.au>

commit 582b8e3eadaec77788c1aa188081a8d5059c42a6
Author: Hans Schillstrom <hans.schillstrom@ericsson.com>
Date:   Thu Apr 26 09:45:35 2012 +0200

    ipvs: take care of return value from protocol init_netns
    
    ip_vs_create_timeout_table() can return NULL
    All functions protocol init_netns is affected of this patch.
    
    Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
    Acked-by: Julian Anastasov <ja@ssi.bg>
    Signed-off-by: Simon Horman <horms@verge.net.au>

commit 4b984cd50bc1b6d492175cd77bfabb78e76ffa67
Author: Hans Schillstrom <hans.schillstrom@ericsson.com>
Date:   Thu Apr 26 09:45:34 2012 +0200

    ipvs: null check of net->ipvs in lblc(r) shedulers
    
    Avoid crash when registering shedulers after
    the IPVS core initialization for netns fails. Do this by
    checking for present core (net->ipvs).
    
    Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
    Acked-by: Julian Anastasov <ja@ssi.bg>
    Signed-off-by: Simon Horman <horms@verge.net.au>

commit 39f618b4fd95ae243d940ec64c961009c74e3333
Author: Julian Anastasov <ja@ssi.bg>
Date:   Wed Apr 25 00:29:58 2012 +0300

    ipvs: reset ipvs pointer in netns
    
    	Make sure net->ipvs is reset on netns cleanup or failed
    initialization. It is needed for IPVS applications to know that
    IPVS core is not loaded in netns.
    
    Signed-off-by: Julian Anastasov <ja@ssi.bg>
    Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
    Signed-off-by: Simon Horman <horms@verge.net.au>

commit 8d08d71ce59438a6ef06be5db07966e0c144b74e
Author: Julian Anastasov <ja@ssi.bg>
Date:   Wed Apr 25 00:29:59 2012 +0300

    ipvs: add check in ftp for initialized core
    
    	Avoid crash when registering ip_vs_ftp after
    the IPVS core initialization for netns fails. Do this by
    checking for present core (net->ipvs).
    
    Signed-off-by: Julian Anastasov <ja@ssi.bg>
    Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
    Signed-off-by: Simon Horman <horms@verge.net.au>

commit 8f9b9a2fad47af27e14b037395e03cd8278d96d7
Author: Julian Anastasov <ja@ssi.bg>
Date:   Fri Apr 13 18:08:43 2012 +0300

    ipvs: fix crash in ip_vs_control_net_cleanup on unload
    
    	commit 14e405461e664b777e2a5636e10b2ebf36a686ec (2.6.39)
    ("Add __ip_vs_control_{init,cleanup}_sysctl()")
    introduced regression due to wrong __net_init for
    __ip_vs_control_cleanup_sysctl. This leads to crash when
    the ip_vs module is unloaded.
    
    	Fix it by changing __net_init to __net_exit for
    the function that is already renamed to ip_vs_control_net_cleanup_sysctl.
    
    Signed-off-by: Julian Anastasov <ja@ssi.bg>
    Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
    Signed-off-by: Simon Horman <horms@verge.net.au>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

commit 7118c07a844d367560ee91adb2071bde2fabcdbf
Author: Sasha Levin <levinsasha928@gmail.com>
Date:   Sat Apr 14 12:37:46 2012 -0400

    ipvs: Verify that IP_VS protocol has been registered
    
    The registration of a protocol might fail, there were no checks
    and all registrations were assumed to be correct. This lead to
    NULL ptr dereferences when apps tried registering.
    
    For example:
    
    [ 1293.226051] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
    [ 1293.227038] IP: [<ffffffff822aacb0>] tcp_register_app+0x60/0xb0
    [ 1293.227038] PGD 391de067 PUD 6c20b067 PMD 0
    [ 1293.227038] Oops: 0000 [#1] PREEMPT SMP
    [ 1293.227038] CPU 1
    [ 1293.227038] Pid: 19609, comm: trinity Tainted: G        W    3.4.0-rc1-next-20120405-sasha-dirty #57
    [ 1293.227038] RIP: 0010:[<ffffffff822aacb0>]  [<ffffffff822aacb0>] tcp_register_app+0x60/0xb0
    [ 1293.227038] RSP: 0018:ffff880038c1dd18  EFLAGS: 00010286
    [ 1293.227038] RAX: ffffffffffffffc0 RBX: 0000000000001500 RCX: 0000000000010000
    [ 1293.227038] RDX: 0000000000000000 RSI: ffff88003a2d5888 RDI: 0000000000000282
    [ 1293.227038] RBP: ffff880038c1dd48 R08: 0000000000000000 R09: 0000000000000000
    [ 1293.227038] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003a2d5668
    [ 1293.227038] R13: ffff88003a2d5988 R14: ffff8800696a8ff8 R15: 0000000000000000
    [ 1293.227038] FS:  00007f01930d9700(0000) GS:ffff88007ce00000(0000) knlGS:0000000000000000
    [ 1293.227038] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [ 1293.227038] CR2: 0000000000000018 CR3: 0000000065dfc000 CR4: 00000000000406e0
    [ 1293.227038] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 1293.227038] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    [ 1293.227038] Process trinity (pid: 19609, threadinfo ffff880038c1c000, task ffff88002dc73000)
    [ 1293.227038] Stack:
    [ 1293.227038]  ffff880038c1dd48 00000000fffffff4 ffff8800696aada0 ffff8800694f5580
    [ 1293.227038]  ffffffff8369f1e0 0000000000001500 ffff880038c1dd98 ffffffff822a716b
    [ 1293.227038]  0000000000000000 ffff8800696a8ff8 0000000000000015 ffff8800694f5580
    [ 1293.227038] Call Trace:
    [ 1293.227038]  [<ffffffff822a716b>] ip_vs_app_inc_new+0xdb/0x180
    [ 1293.227038]  [<ffffffff822a7258>] register_ip_vs_app_inc+0x48/0x70
    [ 1293.227038]  [<ffffffff822b2fea>] __ip_vs_ftp_init+0xba/0x140
    [ 1293.227038]  [<ffffffff821c9060>] ops_init+0x80/0x90
    [ 1293.227038]  [<ffffffff821c90cb>] setup_net+0x5b/0xe0
    [ 1293.227038]  [<ffffffff821c9416>] copy_net_ns+0x76/0x100
    [ 1293.227038]  [<ffffffff810dc92b>] create_new_namespaces+0xfb/0x190
    [ 1293.227038]  [<ffffffff810dca21>] unshare_nsproxy_namespaces+0x61/0x80
    [ 1293.227038]  [<ffffffff810afd1f>] sys_unshare+0xff/0x290
    [ 1293.227038]  [<ffffffff8187622e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
    [ 1293.227038]  [<ffffffff82665539>] system_call_fastpath+0x16/0x1b
    [ 1293.227038] Code: 89 c7 e8 34 91 3b 00 89 de 66 c1 ee 04 31 de 83 e6 0f 48 83 c6 22 48 c1 e6 04 4a 8b 14 26 49 8d 34 34 48 8d 42 c0 48 39 d6 74 13 <66> 39 58 58 74 22 48 8b 48 40 48 8d 41 c0 48 39 ce 75 ed 49 8d
    [ 1293.227038] RIP  [<ffffffff822aacb0>] tcp_register_app+0x60/0xb0
    [ 1293.227038]  RSP <ffff880038c1dd18>
    [ 1293.227038] CR2: 0000000000000018
    [ 1293.379284] ---[ end trace 364ab40c7011a009 ]---
    [ 1293.381182] Kernel panic - not syncing: Fatal exception in interrupt
    
    Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
    Acked-by: Julian Anastasov <ja@ssi.bg>
    Signed-off-by: Simon Horman <horms@verge.net.au>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox