* Re: [PATCH v2] net: bnx2x: convert to hw_features
From: Vladislav Zolotarov @ 2011-04-12 7:46 UTC (permalink / raw)
To: Michał Mirosław; +Cc: netdev@vger.kernel.org, Eilon Greenstein
In-Reply-To: <1302593208.32697.18.camel@lb-tlvb-vladz>
> > In all those cases, bnx2x_reload_if_running() will be called only when
> > LRO state is changed while there's a recovery in progress.
>
> Hmmm... And what about all other features from hw_features? What if they
> have changed (in wanted_features) while recovery was in progress?
> According to the __netdev_update_features() code it will invoke
> ndo_set_features() in these cases either. Do I miss something here?
I think I understood what u meant. So, yes, if the bnx2x_nic_load()
called only if TPA_ENABLED_FLAG in bp->flags has changed. And this can
happen if either NETIF_F_LRO has changed while NETIF_F_RXCSUM was set or
if NETIF_F_LRO was set and NETIF_F_RXCSUM is being cleared.
thanks,
vlad
^ permalink raw reply
* Re: Kernel panic when using bridge
From: Eric Dumazet @ 2011-04-12 7:31 UTC (permalink / raw)
To: Scot Doyle; +Cc: Stephen Hemminger, Hiroaki SHIMODA, netdev
In-Reply-To: <4DA3F909.5020609@scotdoyle.com>
Le mardi 12 avril 2011 à 02:02 -0500, Scot Doyle a écrit :
> On 04/12/2011 12:51 AM, Eric Dumazet wrote:
> >
> > Oh well, sorry (not enough time these days to even test patches)
> >
> > if (!skb_dst(skb)) {
>
> --- br_netfilter.c.a 2011-04-01 02:37:53.000000000 -0500
> +++ br_netfilter.c.b 2011-04-12 00:29:00.000000000 -0500
> @@ -221,6 +221,7 @@ static int br_parse_ip_options(struct sk
> struct ip_options *opt;
> struct iphdr *iph;
> struct net_device *dev = skb->dev;
> + struct rtable *rt;
> u32 len;
>
> iph = ip_hdr(skb);
> @@ -255,6 +256,16 @@ static int br_parse_ip_options(struct sk
> return 0;
> }
>
> + /* Associate bogus bridge route table */
> + if (!skb_dst(skb)) {
> + rt = bridge_parent_rtable(dev);
> + if (!rt) {
> + kfree_skb(skb);
> + return 0;
> + }
> + skb_dst_set_noref(skb,&rt->dst);
> + }
> +
> opt->optlen = iph->ihl*4 - sizeof(struct iphdr);
> if (ip_options_compile(dev_net(dev), opt, skb))
> goto inhdr_error;
>
>
> Now we are making progress! With the patch above from Stephen and Eric,
> I cannot make the kernel panic when sending packets to the IP address of
> the bridge.
>
> However, if a guest virtual machine is sharing the bridge with the host
> via a tap device, I can cause a host panic by targeting the IP address
> of the guest. Is this an unrelated problem?
>
> Here are two kernel panics. The guest virtual machine was pingable
> before being attacked with IP Stack Checker's tcpsic command. Spanning
> Tree Protocol was off during the first panic and on during the second.
>
I wonder if you are not running out of free stack space...
And it might be because of inet_getpeer() calling cleanup_once()
# objdump64 -d net/ipv4/inetpeer.o | scripts/checkstack.pl
0x0317 cleanup_once [inetpeer.o]: 344
0x03d6 cleanup_once [inetpeer.o]: 344
0x0680 inet_getpeer [inetpeer.o]: 344
0x071d inet_getpeer [inetpeer.o]: 344
0x0004 inet_initpeers [inetpeer.o]: 112
^ permalink raw reply
* Re: [PATCH v2] net: bnx2x: convert to hw_features
From: Vladislav Zolotarov @ 2011-04-12 7:26 UTC (permalink / raw)
To: Michał Mirosław; +Cc: netdev@vger.kernel.org, Eilon Greenstein
In-Reply-To: <20110411201225.GA9249@rere.qmqm.pl>
On Mon, 2011-04-11 at 13:12 -0700, Michał Mirosław wrote:
> The v3 patch fixes missing LRO flag and ensures that netdev_update_features()
> won't be called after failed bnx2x_nic_load(). More comments below.
As long as there is v4 already I'll comment it and skip v3. See a few
comments on your comments below. ;)
>
> On Mon, Apr 11, 2011 at 05:10:21PM +0300, Vladislav Zolotarov wrote:
> > On Sun, 2011-04-10 at 08:35 -0700, Michał Mirosław wrote:
> > > Since ndo_fix_features callback is postponing features change when
> > > bp->recovery_state != BNX2X_RECOVERY_DONE, netdev_update_features()
> > > has to be called again when this condition changes.
> > Unfortunately, NACK again. See below, pls.
> [...]
> > > diff --git a/drivers/net/bnx2x/bnx2x_cmn.c b/drivers/net/bnx2x/bnx2x_cmn.c
> > > index e83ac6d..9691b67 100644
> > > --- a/drivers/net/bnx2x/bnx2x_cmn.c
> > > +++ b/drivers/net/bnx2x/bnx2x_cmn.c
> > > @@ -2443,11 +2443,21 @@ alloc_err:
> > >
> > > }
> > >
> > > +static int bnx2x_reload_if_running(struct net_device *dev)
> > > +{
> > > + struct bnx2x *bp = netdev_priv(dev);
> > > +
> > > + if (unlikely(!netif_running(dev)))
> > > + return 0;
> > > +
> > > + bnx2x_nic_unload(bp, UNLOAD_NORMAL);
> > > + return bnx2x_nic_load(bp, LOAD_NORMAL);
> > > +}
> > > +
> > > /* called with rtnl_lock */
> > > int bnx2x_change_mtu(struct net_device *dev, int new_mtu)
> > > {
> [...]
> > > +u32 bnx2x_fix_features(struct net_device *dev, u32 features)
> > > +{
> > > + struct bnx2x *bp = netdev_priv(dev);
> > > +
> > > + if (bp->recovery_state != BNX2X_RECOVERY_DONE) {
> > > + netdev_err(dev, "Handling parity error recovery. Try again later\n");
> > > +
> > > + /* Don't allow bnx2x_set_features() to be called now. */
> > > + return dev->features;
> > > + }
> > > +
> > > + /* TPA requires Rx CSUM offloading */
> > > + if (!(features & NETIF_F_RXCSUM) || bp->disable_tpa)
> > > + features &= ~NETIF_F_LRO;
> > Shouldn't it be (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM) and not
> > NETIF_F_RXCSUM?
> [...]
> > In addition this function should ensure NETIF_F_IP_CSUM and
> > NETIF_F_IPV6_CSUM are changed together.
> [...]
> > > +int bnx2x_set_features(struct net_device *dev, u32 features)
> [...]
> > Since there is no set_rx_csum() anymore the above function has to handle
> > bp->rx_csum namely correlate it with (NETIF_F_IP_CSUM |
> > NETIF_F_IPV6_CSUM) bits in the 'features'.
>
> You seem to confuse TX checksum offloads (IP_CSUM,IPV6_CSUM) with
> RX checksum offload (RXCSUM).
U r right. My bad. However u forgot to add RXCSUM to hw_features in v2
but I see it fixed in v4.
>
> The driver doesn't touch hardware state on changes to checksum offloads
> so they are independent - there's no point in adding artificial
> dependencies here.
Considering Tx csum offloads u are right but this is not true regarding
the Rx csum offload and this is what I meant above. I see that v4
properly handles it now. Sorry for a confusion.
>
> [...]
> > > diff --git a/drivers/net/bnx2x/bnx2x_main.c b/drivers/net/bnx2x/bnx2x_main.c
> > > index f3cf889..ffa0611 100644
> > > --- a/drivers/net/bnx2x/bnx2x_main.c
> > > +++ b/drivers/net/bnx2x/bnx2x_main.c
> > > @@ -7661,6 +7661,7 @@ exit_leader_reset:
> > > bp->is_leader = 0;
> > > bnx2x_release_hw_lock(bp, HW_LOCK_RESOURCE_RESERVED_08);
> > > smp_wmb();
> > > + netdev_update_features(bp->dev);
> > > return rc;
> > > }
> >
> > Before I continue I'd like to clarify one thing: there is no sense to
> > call for netdev_update_features() if bnx2x_nic_load(), called right
> > before it, has failed as long as the following bnx2x_nic_load() that
> > will be called from the netdev_update_features() flow will also fail
> > (for the same reasons as the previous one). If bnx2x_nic_load() fails
> > for the certain NIC we actually shut this NIC down. So, the following
> > remarks will be based on the above statement.
>
> In all those cases, bnx2x_reload_if_running() will be called only when
> LRO state is changed while there's a recovery in progress.
Hmmm... And what about all other features from hw_features? What if they
have changed (in wanted_features) while recovery was in progress?
According to the __netdev_update_features() code it will invoke
ndo_set_features() in these cases either. Do I miss something here?
>
> [...]
> > U shouldn't call for netdev_update_features(bp->dev) if bnx2x_nic_load()
> > has failed. It would also be nice if netdev_update_features() would
> > propagate the exit status of ndo_set_features() when ndo_set_features()
> > fails in the __netdev_update_features().
>
> That's fixed in v3.
Not everything. See below.
>
> > See the patch for the bnx2x below:
> >
> > @@ -8993,7 +8995,14 @@ static int bnx2x_open(struct net_device *dev)
> >
> > bp->recovery_state = BNX2X_RECOVERY_DONE;
> >
> > - return bnx2x_nic_load(bp, LOAD_OPEN);
> > + rc = bnx2x_nic_load(bp, LOAD_OPEN);
> > + if (!rc)
> > + netdev_update_features(bp->dev);
> > +
> > + if (bp->state == BNX2X_STATE_OPEN)
> > + return 0;
> > + else
> > + return -EBUSY;
> > }
>
> Hmm. I missed this part in the v3 patch. This clobbers bnx2x_nic_load()'s
> error return, though.
Exactly! Quoting my remark above: "It would also be nice if
netdev_update_features() would propagate the exit status of
ndo_set_features() when ndo_set_features() fails in the
__netdev_update_features()." Could u comment on this, pls.
>
> > > /* called with rtnl_lock */
> > > @@ -9304,6 +9309,8 @@ static const struct net_device_ops bnx2x_netdev_ops = {
> > > .ndo_validate_addr = eth_validate_addr,
> > > .ndo_do_ioctl = bnx2x_ioctl,
> > > .ndo_change_mtu = bnx2x_change_mtu,
> > > + .ndo_fix_features = bnx2x_fix_features,
> > > + .ndo_set_features = bnx2x_set_features,
> > > .ndo_tx_timeout = bnx2x_tx_timeout,
> > > #ifdef CONFIG_NET_POLL_CONTROLLER
> > > .ndo_poll_controller = poll_bnx2x,
> > > @@ -9430,20 +9437,18 @@ static int __devinit bnx2x_init_dev(struct pci_dev *pdev,
> > >
> > > dev->netdev_ops = &bnx2x_netdev_ops;
> > > bnx2x_set_ethtool_ops(dev);
> > > - dev->features |= NETIF_F_SG;
> > > - dev->features |= NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
> > > +
> > > if (bp->flags & USING_DAC_FLAG)
> > > dev->features |= NETIF_F_HIGHDMA;
> > > - dev->features |= (NETIF_F_TSO | NETIF_F_TSO_ECN);
> > > - dev->features |= NETIF_F_TSO6;
> > > - dev->features |= (NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX);
> > >
> > > - dev->vlan_features |= NETIF_F_SG;
> > > - dev->vlan_features |= NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
> > > - if (bp->flags & USING_DAC_FLAG)
> > > - dev->vlan_features |= NETIF_F_HIGHDMA;
> > > - dev->vlan_features |= (NETIF_F_TSO | NETIF_F_TSO_ECN);
> > > - dev->vlan_features |= NETIF_F_TSO6;
> > > + dev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
> > > + NETIF_F_TSO | NETIF_F_TSO_ECN | NETIF_F_TSO6 |
> > > + NETIF_F_HW_VLAN_TX;
> > hw_features are missing NETIF_F_GRO and NETIF_F_LRO flags that are
> > currently configured in bnx2x_init_bp().
>
> GRO is enabled by core now. LRO is fixed in v3.
Got it. Thanks.
>
> > > + dev->features |= dev->hw_features | NETIF_F_HW_VLAN_RX;
> > > +
> > > + dev->vlan_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
> > > + NETIF_F_TSO | NETIF_F_TSO_ECN | NETIF_F_TSO6 | NETIF_F_HIGHDMA;
> > I'm not sure if it's safe to set NETIF_F_HIGHDMA unconditionally. I
> > think it's better to correlate it with the USING_DAC_FLAG which is set
> > according to what is returned by
> > dma_set_mask(&pdev->dev, DMA_BIT_MASK(64)).
>
> dev->vlan_features get masked with dev->features and only then applied
> to VLAN device.
Ok. However, could, pls., quote the above sentence of yours as a comment
for this code line? ;)
See my further comments for v4.
thanks,
vlad
>
> Best Regards,
> Michał Mirosław
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply
* [PATCH NET-2.6 0/1]qlcnic: bug fix
From: Amit Kumar Salecha @ 2011-04-12 7:19 UTC (permalink / raw)
To: davem; +Cc: netdev, ameen.rahman, anirban.chakraborty
David,
Apply this fix to net-2.6 tree.
This patch will give hunk failure while merging to net-next tree.
Somehow I can't avoid it. Two lines below diff has changed in qlcnic_xmit_frame().
-Amit
^ permalink raw reply
* [PATCH NET-2.6 1/1] qlcnic: limit skb frags for non tso packet
From: Amit Kumar Salecha @ 2011-04-12 7:19 UTC (permalink / raw)
To: davem; +Cc: netdev, anirban.chakraborty, stable, ameen.rahman
In-Reply-To: <1302592781-13881-1-git-send-email-amit.salecha@qlogic.com>
Machines are getting deadlock in four node cluster environment.
All nodes are accessing (find /gfs2 -depth -print|cpio -ocv > /dev/null)
200 GB storage on a GFS2 filesystem.
This result in memory fragmentation and driver receives 18 frags for
1448 byte packets.
For non tso packet, fw drops the tx request, if it has >14 frags.
Fixing it by pulling extra frags.
Cc: stable@kernel.org
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
drivers/net/qlcnic/qlcnic.h | 1 +
drivers/net/qlcnic/qlcnic_main.c | 14 ++++++++++++++
2 files changed, 15 insertions(+), 0 deletions(-)
diff --git a/drivers/net/qlcnic/qlcnic.h b/drivers/net/qlcnic/qlcnic.h
index dc44564..b0dead0 100644
--- a/drivers/net/qlcnic/qlcnic.h
+++ b/drivers/net/qlcnic/qlcnic.h
@@ -99,6 +99,7 @@
#define TX_UDPV6_PKT 0x0c
/* Tx defines */
+#define QLCNIC_MAX_FRAGS_PER_TX 14
#define MAX_TSO_HEADER_DESC 2
#define MGMT_CMD_DESC_RESV 4
#define TX_STOP_THRESH ((MAX_SKB_FRAGS >> 2) + MAX_TSO_HEADER_DESC \
diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index cd88c7e..cb1a1ef 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -2099,6 +2099,7 @@ qlcnic_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
struct cmd_desc_type0 *hwdesc, *first_desc;
struct pci_dev *pdev;
struct ethhdr *phdr;
+ int delta = 0;
int i, k;
u32 producer;
@@ -2118,6 +2119,19 @@ qlcnic_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
}
frag_count = skb_shinfo(skb)->nr_frags + 1;
+ /* 14 frags supported for normal packet and
+ * 32 frags supported for TSO packet
+ */
+ if (!skb_is_gso(skb) && frag_count > QLCNIC_MAX_FRAGS_PER_TX) {
+
+ for (i = 0; i < (frag_count - QLCNIC_MAX_FRAGS_PER_TX); i++)
+ delta += skb_shinfo(skb)->frags[i].size;
+
+ if (!__pskb_pull_tail(skb, delta))
+ goto drop_packet;
+
+ frag_count = 1 + skb_shinfo(skb)->nr_frags;
+ }
/* 4 fragments per cmd des */
no_of_desc = (frag_count + 3) >> 2;
--
1.7.3.2
_______________________________________________
stable mailing list
stable@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/stable
^ permalink raw reply related
* Re: Kernel panic when using bridge
From: Scot Doyle @ 2011-04-12 7:02 UTC (permalink / raw)
To: Eric Dumazet, Stephen Hemminger; +Cc: Hiroaki SHIMODA, netdev
In-Reply-To: <1302587490.3603.22.camel@edumazet-laptop>
On 04/12/2011 12:51 AM, Eric Dumazet wrote:
>
> Oh well, sorry (not enough time these days to even test patches)
>
> if (!skb_dst(skb)) {
--- br_netfilter.c.a 2011-04-01 02:37:53.000000000 -0500
+++ br_netfilter.c.b 2011-04-12 00:29:00.000000000 -0500
@@ -221,6 +221,7 @@ static int br_parse_ip_options(struct sk
struct ip_options *opt;
struct iphdr *iph;
struct net_device *dev = skb->dev;
+ struct rtable *rt;
u32 len;
iph = ip_hdr(skb);
@@ -255,6 +256,16 @@ static int br_parse_ip_options(struct sk
return 0;
}
+ /* Associate bogus bridge route table */
+ if (!skb_dst(skb)) {
+ rt = bridge_parent_rtable(dev);
+ if (!rt) {
+ kfree_skb(skb);
+ return 0;
+ }
+ skb_dst_set_noref(skb,&rt->dst);
+ }
+
opt->optlen = iph->ihl*4 - sizeof(struct iphdr);
if (ip_options_compile(dev_net(dev), opt, skb))
goto inhdr_error;
Now we are making progress! With the patch above from Stephen and Eric,
I cannot make the kernel panic when sending packets to the IP address of
the bridge.
However, if a guest virtual machine is sharing the bridge with the host
via a tap device, I can cause a host panic by targeting the IP address
of the guest. Is this an unrelated problem?
Here are two kernel panics. The guest virtual machine was pingable
before being attacked with IP Stack Checker's tcpsic command. Spanning
Tree Protocol was off during the first panic and on during the second.
------------
[ 606.921739] br0: port 2(tap0) entering forwarding state
[ 636.058941] Kernel panic - not syncing: stack-protector: Kernel stack
is corrupted in: ffffffff812c2781
[ 636.058942]
[ 636.069789] Pid: 2261, comm: kvm Tainted: G W 2.6.39-rc2+ #11
[ 636.076292] Call Trace:
[ 636.078725] <IRQ> [<ffffffff8132ad78>] ? panic+0x92/0x1a1
[ 636.084287] [<ffffffff8104abe8>] ? _local_bh_enable_ip.clone.8+0x20/0x8c
[ 636.091044] [<ffffffff812c2781>] ? icmp_send+0x337/0x349
[ 636.096418] [<ffffffff810454e5>] ? __stack_chk_fail+0x17/0x17
[ 636.102221] [<ffffffff812c2781>] ? icmp_send+0x337/0x349
[ 636.107595] [<ffffffff81298527>] ? nf_iterate+0x41/0x7e
[ 636.112883] [<ffffffff81298527>] ? nf_iterate+0x41/0x7e
[ 636.118172] [<ffffffffa017b0d4>] ? br_flood+0xc8/0xc8 [bridge]
[ 636.124065] [<ffffffffa017b250>] ? __br_deliver+0xb0/0xb0 [bridge]
[ 636.130302] [<ffffffff812985d7>] ? nf_hook_slow+0x73/0x114
[ 636.135850] [<ffffffffa017b250>] ? __br_deliver+0xb0/0xb0 [bridge]
[ 636.142089] [<ffffffffa017be89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
[ 636.148586] [<ffffffffa017b250>] ? __br_deliver+0xb0/0xb0 [bridge]
[ 636.154826] [<ffffffffa017b186>] ? NF_HOOK.clone.5+0x3c/0x56 [bridge]
[ 636.161323] [<ffffffffa017bfe1>] ?
br_handle_frame_finish+0x158/0x1c7 [bridge]
[ 636.168601] [<ffffffffa0180689>] ?
br_nf_pre_routing_finish+0x1d4/0x1e1 [bridge]
[ 636.176052] [<ffffffffa017fc76>] ? NF_HOOK_THRESH+0x3b/0x55 [bridge]
[ 636.182463] [<ffffffffa0180c84>] ? br_nf_pre_routing+0x3be/0x3cb
[bridge]
[ 636.189307] [<ffffffff812985d7>] ? nf_hook_slow+0x73/0x114
[ 636.194852] [<ffffffff81298527>] ? nf_iterate+0x41/0x7e
[ 636.200139] [<ffffffffa017be89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
[ 636.206637] [<ffffffffa017be89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
[ 636.213133] [<ffffffff812985d7>] ? nf_hook_slow+0x73/0x114
[ 636.218679] [<ffffffffa017be89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
[ 636.225177] [<ffffffffa017bfe1>] ?
br_handle_frame_finish+0x158/0x1c7 [bridge]
[ 636.232455] [<ffffffffa017be89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
[ 636.238954] [<ffffffffa017be6f>] ? NF_HOOK.clone.4+0x3c/0x56 [bridge]
[ 636.245452] [<ffffffff812a7d8e>] ? tcp_gro_receive+0xa1/0x204
[ 636.251258] [<ffffffffa017c1e5>] ? br_handle_frame+0x195/0x1ac [bridge]
[ 636.257928] [<ffffffffa017c050>] ?
br_handle_frame_finish+0x1c7/0x1c7 [bridge]
[ 636.265204] [<ffffffff812764ef>] ? __netif_receive_skb+0x2a7/0x450
[ 636.271443] [<ffffffff81276928>] ? netif_receive_skb+0x52/0x58
[ 636.277335] [<ffffffff81276e2a>] ? napi_gro_receive+0x1f/0x2f
[ 636.283139] [<ffffffff812769ff>] ? napi_skb_finish+0x1c/0x31
[ 636.288865] [<ffffffffa0241fcd>] ? igb_poll+0x6d9/0x9ee [igb]
[ 636.294673] [<ffffffffa003bde2>] ? scsi_run_queue+0x2ce/0x30a [scsi_mod]
[ 636.301431] [<ffffffffa017be89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
[ 636.307930] [<ffffffff812764ef>] ? __netif_receive_skb+0x2a7/0x450
[ 636.314168] [<ffffffff81276f55>] ? net_rx_action+0xa4/0x1b1
[ 636.319800] [<ffffffff8104ad26>] ? __do_softirq+0xb8/0x176
[ 636.325346] [<ffffffff81333c5c>] ? call_softirq+0x1c/0x30
[ 636.330807] [<ffffffff8100aa57>] ? do_softirq+0x3f/0x84
[ 636.336092] [<ffffffff8104af91>] ? irq_exit+0x3f/0x8f
[ 636.341204] [<ffffffff8100a793>] ? do_IRQ+0x85/0x9e
[ 636.346146] [<ffffffff8132cbd3>] ? common_interrupt+0x13/0x13
[ 636.351949] <EOI> [<ffffffff81271f58>] ? arch_local_irq_save+0x12/0x1b
[ 636.358629] [<ffffffff8100a9f2>] ? arch_local_irq_restore+0x2/0x8
[ 636.364781] [<ffffffff8127680d>] ? netif_rx_ni+0x1e/0x27
[ 636.370154] [<ffffffffa01557d2>] ? tun_get_user+0x3a3/0x3cb [tun]
[ 636.376305] [<ffffffffa0155bd8>] ? tun_get_socket+0x3b/0x3b [tun]
[ 636.382457] [<ffffffffa0155c36>] ? tun_chr_aio_write+0x5e/0x79 [tun]
[ 636.388869] [<ffffffff810f6b07>] ? do_sync_readv_writev+0x9a/0xd5
[ 636.395021] [<ffffffff810371f3>] ? need_resched+0x1a/0x23
[ 636.400481] [<ffffffff8132b725>] ? _cond_resched+0x9/0x20
[ 636.405941] [<ffffffff810f5f77>] ? copy_from_user+0x18/0x30
[ 636.411573] [<ffffffff8115fbf6>] ? security_file_permission+0x18/0x33
[ 636.418068] [<ffffffff810f6d55>] ? do_readv_writev+0xa4/0x11a
[ 636.423873] [<ffffffff810f7913>] ? fput+0x1a/0x1a2
[ 636.428726] [<ffffffff810f6f39>] ? sys_writev+0x45/0x90
[ 636.434012] [<ffffffff81332a52>] ? system_call_fastpath+0x16/0x1b
------------
[ 110.442839] br0: port 2(tap0) entering forwarding state
[ 136.948700] Kernel panic - not syncing: stack-protector: Kernel stack
is corrupted in: ffffffff812c2781
[ 136.948702]
[ 136.959561] Pid: 1093, comm: md123_resync Not tainted 2.6.39-rc2+ #11
[ 136.965977] Call Trace:
[ 136.968408] <IRQ> [<ffffffff8132ad78>] ? panic+0x92/0x1a1
[ 136.973970] [<ffffffff8104abe8>] ? _local_bh_enable_ip.clone.8+0x20/0x8c
[ 136.980727] [<ffffffff812c2781>] ? icmp_send+0x337/0x349
[ 136.986102] [<ffffffff810454e5>] ? __stack_chk_fail+0x17/0x17
[ 136.991906] [<ffffffff812c2781>] ? icmp_send+0x337/0x349
[ 136.997281] [<ffffffff81298527>] ? nf_iterate+0x41/0x7e
[ 137.002570] [<ffffffffa0198fe1>] ?
br_handle_frame_finish+0x158/0x1c7 [bridge]
[ 137.009847] [<ffffffffa019d689>] ?
br_nf_pre_routing_finish+0x1d4/0x1e1 [bridge]
[ 137.017297] [<ffffffffa019cc76>] ? NF_HOOK_THRESH+0x3b/0x55 [bridge]
[ 137.023707] [<ffffffffa019dc84>] ? br_nf_pre_routing+0x3be/0x3cb
[bridge]
[ 137.030551] [<ffffffff81298527>] ? nf_iterate+0x41/0x7e
[ 137.035837] [<ffffffff8103704d>] ? test_tsk_need_resched+0xe/0x17
[ 137.041991] [<ffffffffa0198e89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
[ 137.048488] [<ffffffffa0198e89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
[ 137.054984] [<ffffffff812985d7>] ? nf_hook_slow+0x73/0x114
[ 137.060531] [<ffffffffa0198e89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
[ 137.067028] [<ffffffffa0198e89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
[ 137.073526] [<ffffffffa0198e6f>] ? NF_HOOK.clone.4+0x3c/0x56 [bridge]
[ 137.080023] [<ffffffff812a7d8e>] ? tcp_gro_receive+0xa1/0x204
[ 137.085830] [<ffffffffa01991e5>] ? br_handle_frame+0x195/0x1ac [bridge]
[ 137.092500] [<ffffffffa0199050>] ?
br_handle_frame_finish+0x1c7/0x1c7 [bridge]
[ 137.099776] [<ffffffff812764ef>] ? __netif_receive_skb+0x2a7/0x450
[ 137.106013] [<ffffffff81276928>] ? netif_receive_skb+0x52/0x58
[ 137.111906] [<ffffffff81276e2a>] ? napi_gro_receive+0x1f/0x2f
[ 137.117713] [<ffffffff812769ff>] ? napi_skb_finish+0x1c/0x31
[ 137.123438] [<ffffffffa0226fcd>] ? igb_poll+0x6d9/0x9ee [igb]
[ 137.129243] [<ffffffff8109034f>] ? handle_irq_event+0x40/0x55
[ 137.135049] [<ffffffff8132cbd3>] ? common_interrupt+0x13/0x13
[ 137.140854] [<ffffffff81276f55>] ? net_rx_action+0xa4/0x1b1
[ 137.146487] [<ffffffff8104ad26>] ? __do_softirq+0xb8/0x176
[ 137.152034] [<ffffffff81333c5c>] ? call_softirq+0x1c/0x30
[ 137.157494] [<ffffffff8100aa57>] ? do_softirq+0x3f/0x84
[ 137.162779] [<ffffffff8104af91>] ? irq_exit+0x3f/0x8f
[ 137.167893] [<ffffffff8100a793>] ? do_IRQ+0x85/0x9e
[ 137.172833] [<ffffffff8132cbd3>] ? common_interrupt+0x13/0x13
[ 137.178636] <EOI> [<ffffffff8106fc1a>] ? arch_local_irq_restore+0x2/0x8
[ 137.185408] [<ffffffffa0050fca>] ? _scsih_qcmd+0x54f/0x561 [mpt2sas]
[ 137.191823] [<ffffffffa01e452f>] ? scsi_dispatch_cmd+0x180/0x219
[scsi_mod]
[ 137.198841] [<ffffffffa01ea385>] ? scsi_request_fn+0x3e6/0x413
[scsi_mod]
[ 137.205683] [<ffffffff81187470>] ? elv_rqhash_add.clone.15+0x26/0x4c
[ 137.212095] [<ffffffff8118bde2>] ? __blk_run_queue+0x5e/0x84
[ 137.217814] [<ffffffff8118d63c>] ? __make_request+0x273/0x28f
[ 137.223619] [<ffffffff8118b569>] ? generic_make_request+0x267/0x2e1
[ 137.229943] [<ffffffff8105eb49>] ? remove_wait_queue+0x11/0x4d
[ 137.235837] [<ffffffffa0002417>] ? raise_barrier+0x162/0x16f [raid1]
[ 137.242246] [<ffffffff8103eba4>] ? try_to_wake_up+0x17c/0x17c
[ 137.248052] [<ffffffffa0002f2f>] ? sync_request+0x567/0x583 [raid1]
[ 137.254379] [<ffffffffa00bd834>] ? md_do_sync+0x776/0xb8e [md_mod]
[ 137.260617] [<ffffffff8100e537>] ? sched_clock+0x5/0x8
[ 137.265819] [<ffffffffa00bde83>] ? md_thread+0xfa/0x118 [md_mod]
[ 137.271886] [<ffffffffa00bdd89>] ? md_rdev_init+0x8f/0x8f [md_mod]
[ 137.278124] [<ffffffffa00bdd89>] ? md_rdev_init+0x8f/0x8f [md_mod]
[ 137.284362] [<ffffffff8105e497>] ? kthread+0x7a/0x82
[ 137.289390] [<ffffffff81333b64>] ? kernel_thread_helper+0x4/0x10
[ 137.295454] [<ffffffff8105e41d>] ? kthread_worker_fn+0x149/0x149
[ 137.301519] [<ffffffff81333b60>] ? gs_change+0x13/0x13
^ permalink raw reply
* Is __xfrm_lookup always on non-atomic context ?
From: Eduardo Panisset @ 2011-04-12 5:58 UTC (permalink / raw)
To: netdev
Hi all,
I'm using XFRM for tunneling payload traffic on Dual Stack Mobility application.
However, if correspondent XFRM states to XFRM policy's templates have
not been registered yet, It's possible the current process wait for
them, using a wait queue.
But what if this function is not being called on atomic context (e.g. softirq) ?
Thanks in advance,
Eduardo Panisset.
^ permalink raw reply
* Re: Kernel panic when using bridge
From: Eric Dumazet @ 2011-04-12 5:51 UTC (permalink / raw)
To: Scot Doyle; +Cc: Stephen Hemminger, Hiroaki SHIMODA, netdev
In-Reply-To: <4DA3E074.5090603@scotdoyle.com>
Le mardi 12 avril 2011 à 00:17 -0500, Scot Doyle a écrit :
> On 04/11/2011 11:22 PM, Eric Dumazet wrote:
> > Also, I would first check if skb->dst already set to not leak a dst
> >
> > if (!skb->dst) {
Oh well, sorry (not enough time these days to even test patches)
if (!skb_dst(skb)) {
> > rt = bridge_parent_rtable(dev);
> > if (!rt) {
> > kfree_skb(skb);
> > return 0;
> > }
> > skb_dst_set_noref(skb,&rt->dst);
> > }
>
> Thank you for the idea. Here is the compiler output referring to the
> first line above.
>
> net/bridge/br_netfilter.c: In function 'br_parse_ip_options':
> net/bridge/br_netfilter.c:260:10: error: 'struct sk_buff' has no member
> named 'dst'
>
^ permalink raw reply
* RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
From: Wei Gu @ 2011-04-12 5:18 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Alexander Duyck, Peter Zijlstra, netdev, Kirsher, Jeffrey T
In-Reply-To: <1302584201.3603.20.camel@edumazet-laptop>
Hi,
It doesn't looks any better by pass this param to kernel
kernel /vmlinuz-2.6.35.2 ro root=UUID=e96f9df8-c28a-4ea8-ac26-64fbf948bce2 rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM LANG=en_US.iso88591 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=sv-latin1 crashkernel=auto pci=bfsort rhgb quiet console=tty0 console=ttyS0,115200 processor.max_cstate=1
-----Original Message-----
From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
Sent: Tuesday, April 12, 2011 12:57 PM
To: Wei Gu
Cc: Alexander Duyck; Peter Zijlstra; netdev; Kirsher, Jeffrey T
Subject: RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
Le mardi 12 avril 2011 à 12:40 +0800, Wei Gu a écrit :
> Hi,
> I found the problem was introduced by this revert patch "2010-08-13
> Peter Zijlstra sched: Revert nohz_ratelimit() for now"
>
> I tried the remove this patch from 2.6.35.2 and then build the
> application again, then the ixgbe driver looks works fine.
> I don't know why this time revert the nohz_ratelimit() will cause the
> problem on ixgbe driver, since this nohz_ratelimit was first
> introduced "2010-03-11". And before that time with 2.6.32 kernel it
> also doesn't have this problem with ixgbe driver.
>
>
> Some log from git:
> ======================================================================
> ===================
> 2.6.35.2
> 2010-08-13 Peter Zijlstra sched: Revert nohz_ratelimit() for now
> 2.6.35.1
> 2010-08-01 Linus Torvalds Linux 2.6.35 v2.6.35
> 2010-06-17 Peter Zijlstra nohz: Fix nohz ratelimit
> 2.6.35-rc3
> 2010-03-11 Mike Galbraith sched: Rate-limit nohz
>
> Thanks
> WeiGu
>
Hmm...
Could you try to add "processor.max_cstate=1" to boot parameters ?
^ permalink raw reply
* Re: Kernel panic when using bridge
From: Scot Doyle @ 2011-04-12 5:17 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Stephen Hemminger, Hiroaki SHIMODA, netdev
In-Reply-To: <1302582172.3603.18.camel@edumazet-laptop>
On 04/11/2011 11:22 PM, Eric Dumazet wrote:
> Also, I would first check if skb->dst already set to not leak a dst
>
> if (!skb->dst) {
> rt = bridge_parent_rtable(dev);
> if (!rt) {
> kfree_skb(skb);
> return 0;
> }
> skb_dst_set_noref(skb,&rt->dst);
> }
Thank you for the idea. Here is the compiler output referring to the
first line above.
net/bridge/br_netfilter.c: In function 'br_parse_ip_options':
net/bridge/br_netfilter.c:260:10: error: 'struct sk_buff' has no member
named 'dst'
^ permalink raw reply
* Re: [PATCH] net: ipv4: add IPPROTO_ICMP socket kind
From: Solar Designer @ 2011-04-12 5:06 UTC (permalink / raw)
To: Vasiliy Kulikov
Cc: linux-kernel, netdev, Pavel Kankovsky, Kees Cook, Dan Rosenberg,
Eugene Teo, Nelson Elhage, David S. Miller, Alexey Kuznetsov,
Pekka Savola, James Morris, Hideaki YOSHIFUJI, Patrick McHardy
In-Reply-To: <20110409101514.GA4262@albatros>
On Sat, Apr 09, 2011 at 02:15:14PM +0400, Vasiliy Kulikov wrote:
> This patch adds IPPROTO_ICMP socket kind. It makes it possible to send
> ICMP_ECHO messages and receive the corresponding ICMP_ECHOREPLY messages
> without any special privileges. In other words, the patch makes it
> possible to implement setuid-less and CAP_NET_RAW-less /bin/ping. In
> order not to increase the kernel's attack surface (in case of
> vulnerabilities in the newly added code), the new functionality is
> disabled by default, but is enabled at bootup by supporting Linux
> distributions, optionally with restriction to a group or a group range
...
> For Openwall GNU/*/Linux it is the last step on the road to the
> setuid-less distro.
More correctly, it _was_ the last step - we've already taken it, so a
revision of the patch (against OpenVZ/RHEL5 kernels) is currently in use.
We would really like this accepted into mainline, which is why Vasiliy
spends extra effort to keep the patch updated to current mainline
kernels and re-test it. If there are any comments/concerns/objections,
we'd be happy to hear those.
> Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Acked-by: Solar Designer <solar@openwall.com>
> include/net/netns/ipv4.h | 2 +
> include/net/ping.h | 69 ++++
> net/ipv4/Kconfig | 21 +
> net/ipv4/Makefile | 1 +
> net/ipv4/af_inet.c | 36 ++
> net/ipv4/icmp.c | 14 +-
> net/ipv4/ping.c | 933 ++++++++++++++++++++++++++++++++++++++++++++
> net/ipv4/sysctl_net_ipv4.c | 90 +++++
> 8 files changed, 1165 insertions(+), 1 deletions(-)
Thanks,
Alexander
^ permalink raw reply
* Re: Race condition when creating multiple namespaces?
From: Hans Schillstrom @ 2011-04-12 4:56 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: netdev, Daniel Lezcano
In-Reply-To: <m1ei58co08.fsf@fess.ebiederm.org>
On Tuesday, April 12, 2011 02:27:35 Eric W. Biederman wrote:
> Hans Schillstrom <hans@schillstrom.com> writes:
>
> > Hello
> > I'v been strugling with this for some time now
> >
> > When creating multiple namespaces using lxc-start, un-initialized network namespace parts will be called by the new process in the namespace.
> > ex. when using conntrack or ipvsadm to quickly, (a sleep 2 "solves" the problem).
> > (From what I can see syscall clone() is used in lx-start i.e. do_fork will be called later on.)
> > Actually I was debugging ip_vs when closing multiple ns when I fell into this one.
> >
> > I have a loop that create 33 containers whith lxc-start ... -- test.sh
> > the first thing the new conatiner does in test.sh is
> > #!/bin/bash
> > iptables -t mangle -A PREROUTING -m conntrack --ctstate RELATED,ESTABLISHED -j CONNMARK --restore-mark
> > nc -l -p1234
> >
> > This results in NULL ptr in ip_conntrack_net_init(struct *net)
>
> Ouch!
>
> > and in anoither test test.sh looks like this
> > #!/bin/bash
> > ipvsadm --start-daemon=master --mcast-interface=lo
> > nc -l -p1234
> >
> > And this results in an uniitialized spinlock in ip_vs_sync
> >
> > I put a printk in nsproxy: copy_namespaces() and could see a dozens of them
> > before anything appears from ipvs or conntrack.
> >
> > My feeling is that when you start up user processes in a new name space,
> > all kernel related init should have been done (you should not need to add a sleep to get it working)
> >
> > All test made by using todays net-next-2.6 (2.6.39-rc1)
> >
> > Note:
> > That neither conntrack or ip_vs modules where loaded,
> > if modules where loaded before creating new namespaces it all works...
> >
> > Finally the question,
> > Should it really work to load modules within a namespace ,
> > that is a part of netns ?
>
> >From an implementation point of view kernel modules are not in a
> namespace, so there should be no difference between being in a namespace
> and loading a kernel networking module and not being in a namespace and
> loading in a kernel module.
>
> It does sound like you have hit a module loading race, and perhaps
> a race that is confined to network namespaces.
>
> My head is in another problem so I won't be able to look at this for
> a bit. But if you are getting into ip_conntrack_net_init with
> a NULL network namespace something spectacularly bad is happening.
OK I'll continue to dig into this.
>
> In particular it looks like you must be hitting a bug in for_each_net.
> Which would pretty much have to be a race in adding or removing from
> net_namespace_list.
It was further down in proc_net_fops_create()
>
> I took a quick skim through the code and whenever we modify the
> net_namespace we hold but the net_mutex and inside it the rtnl_lock so I
> don't immediate see how you could be getting a NULL net into
> ip_conntrack_net_init.
I do had the same problem in ip_vs a couple of times, but at that time I thought it was my changes...
In the ip_vs case it seems to be more like a race or a missing lock one core reach a "not fully" initialized ipvs struct.
That could be my fault like bad order when calling register_pernet_subsys...
>
> Is there a codepath besides register_pernet_subsys that is calling
> ip_conntrack_net_init?
Not what I can see...
>
> Do you have any local modifications that could be messing up register_pernet_subsys?
Not right now (I took them away, a clean git clone)
>
> Eric
>
I will continue with this today
Thanks a lot
Hans
^ permalink raw reply
* RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
From: Eric Dumazet @ 2011-04-12 4:56 UTC (permalink / raw)
To: Wei Gu; +Cc: Alexander Duyck, Peter Zijlstra, netdev, Kirsher, Jeffrey T
In-Reply-To: <D12839161ADD3A4B8DA63D1A134D084026E490995D@ESGSCCMS0001.eapac.ericsson.se>
Le mardi 12 avril 2011 à 12:40 +0800, Wei Gu a écrit :
> Hi,
> I found the problem was introduced by this revert patch "2010-08-13
> Peter Zijlstra sched: Revert nohz_ratelimit() for now"
>
> I tried the remove this patch from 2.6.35.2 and then build the
> application again, then the ixgbe driver looks works fine.
> I don't know why this time revert the nohz_ratelimit() will cause the
> problem on ixgbe driver, since this nohz_ratelimit was first
> introduced "2010-03-11". And before that time with 2.6.32 kernel it
> also doesn't have this problem with ixgbe driver.
>
>
> Some log from git:
> =========================================================================================
> 2.6.35.2
> 2010-08-13 Peter Zijlstra sched: Revert nohz_ratelimit() for now
> 2.6.35.1
> 2010-08-01 Linus Torvalds Linux 2.6.35 v2.6.35
> 2010-06-17 Peter Zijlstra nohz: Fix nohz ratelimit
> 2.6.35-rc3
> 2010-03-11 Mike Galbraith sched: Rate-limit nohz
>
> Thanks
> WeiGu
>
Hmm...
Could you try to add "processor.max_cstate=1" to boot parameters ?
^ permalink raw reply
* [PATCH] driver/e1000e: Fix default interrupt mode select
From: Prabhakar Kushwaha @ 2011-04-12 4:56 UTC (permalink / raw)
To: linuxppc-dev, linux.nics, auke-jan.h.kok, e1000-devel, netdev
Cc: meet2prabhu, Prabhakar, Jin Qing
From: Prabhakar <prabhakar@freescale.com>
The Intel e1000 device driver defaults to MSI interrupt mode, even if MSI
support is not enabled
Signed-off-by: Jin Qing <b24347@freescale.com>
Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
---
Based upon git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git(branch master)
added netdev mail-list and e1000 mail-list & maintainer
drivers/net/e1000e/param.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/drivers/net/e1000e/param.c b/drivers/net/e1000e/param.c
index a150e48..7b3bbec 100644
--- a/drivers/net/e1000e/param.c
+++ b/drivers/net/e1000e/param.c
@@ -390,7 +390,11 @@ void __devinit e1000e_check_options(struct e1000_adapter *adapter)
.type = range_option,
.name = "Interrupt Mode",
.err = "defaulting to 2 (MSI-X)",
+#ifdef CONFIG_PCI_MSI
.def = E1000E_INT_MODE_MSIX,
+#else
+ .def = E1000E_INT_MODE_LEGACY,
+#endif
.arg = { .r = { .min = MIN_INTMODE,
.max = MAX_INTMODE } }
};
--
1.7.3
^ permalink raw reply related
* RE: [PATCHv2 NEXT 2/8] qlcnic: fix eswitch stats
From: Amit Salecha @ 2011-04-12 4:48 UTC (permalink / raw)
To: David Miller, Stephen Hemminger
Cc: netdev@vger.kernel.org, Ameen Rahman, Anirban Chakraborty
In-Reply-To: <20110411.155517.200362844.davem@davemloft.net>
> From: Amit Kumar Salecha <amit.salecha@qlogic.com>
> Date: Mon, 4 Oct 2010 08:14:51 -0700
>
> > Some of the counters are not implemented in fw.
> > Fw return NOT AVAILABLE VALUE as (0xffffffffffffffff).
> > Adding these counters, result in invalid value.
> >
> > Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
>
> Why are patches being posted from back in October 4, 2010?
My mail server is spamming mail, please ignore all below emails:
[PATCH NEXT 1/2] netxen: Notify firmware of Flex-10 interface down
[PATCHv2 NEXT 3/8] qlcnic: fix diag register
[PATCHv2 NEXT 8/8] qlcnic: set mtu lower limit
[PATCH NEXT 0/2]nexten: bug fixes
[PATCHv2 NEXT 2/8] qlcnic: fix eswitch stats
[PATCH NEXT 2/2] netxen: support for GbE port settings
[PATCHv2 NEXT 6/8] qlcnic: sparse warning fixes
[PATCHv2 NEXT 7/8] qlcnic: cleanup port mode setting
[PATCHv2 NEXT 5/8] qlcnic: fix vlan TSO on big endian machine
Sorry for inconvenience caused to all.
-Amit
This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message.
^ permalink raw reply
* [PATCH] net: davinci_emac: fix spinlock bug with dma channel cleanup
From: Sriramakrishnan A G @ 2011-04-12 4:42 UTC (permalink / raw)
To: netdev; +Cc: davinci-linux-open-source, davem, Sriramakrishnan A G
The DMA cleanup function was holding the spinlock across
a busy loop where it waits for HW to indicate teardown is complete.
This generates a backtrace, when DEBUG_SPINLOCK is enabled. Make the
locking more granular.
Signed-off-by: Sriramakrishnan A G <srk@ti.com>
---
drivers/net/davinci_cpdma.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/drivers/net/davinci_cpdma.c b/drivers/net/davinci_cpdma.c
index ae47f23..57fd0fc 100644
--- a/drivers/net/davinci_cpdma.c
+++ b/drivers/net/davinci_cpdma.c
@@ -824,6 +824,8 @@ int cpdma_chan_stop(struct cpdma_chan *chan)
/* trigger teardown */
dma_reg_write(ctlr, chan->td, chan->chan_num);
+ spin_unlock_irqrestore(&chan->lock, flags);
+
/* wait for teardown complete */
timeout = jiffies + HZ/10; /* 100 msec */
while (time_before(jiffies, timeout)) {
@@ -843,6 +845,7 @@ int cpdma_chan_stop(struct cpdma_chan *chan)
} while ((ret & CPDMA_DESC_TD_COMPLETE) == 0);
/* remaining packets haven't been tx/rx'ed, clean them up */
+ spin_lock_irqsave(&chan->lock, flags);
while (chan->head) {
struct cpdma_desc __iomem *desc = chan->head;
dma_addr_t next_dma;
--
1.6.2.4
^ permalink raw reply related
* RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
From: Wei Gu @ 2011-04-12 4:40 UTC (permalink / raw)
To: Eric Dumazet, Alexander Duyck, Peter Zijlstra; +Cc: netdev, Kirsher, Jeffrey T
In-Reply-To: <1302536577.4605.1.camel@edumazet-laptop>
Hi,
I found the problem was introduced by this revert patch "2010-08-13 Peter Zijlstra sched: Revert nohz_ratelimit() for now"
I tried the remove this patch from 2.6.35.2 and then build the application again, then the ixgbe driver looks works fine.
I don't know why this time revert the nohz_ratelimit() will cause the problem on ixgbe driver, since this nohz_ratelimit was first introduced "2010-03-11". And before that time with 2.6.32 kernel it also doesn't have this problem with ixgbe driver.
Some log from git:
=========================================================================================
2.6.35.2
2010-08-13 Peter Zijlstra sched: Revert nohz_ratelimit() for now
2.6.35.1
2010-08-01 Linus Torvalds Linux 2.6.35 v2.6.35
2010-06-17 Peter Zijlstra nohz: Fix nohz ratelimit
2.6.35-rc3
2010-03-11 Mike Galbraith sched: Rate-limit nohz
Thanks
WeiGu
-----Original Message-----
From: Wei Gu
Sent: Tuesday, April 12, 2011 9:23 AM
To: 'Eric Dumazet'
Cc: Alexander Duyck; netdev; Kirsher, Jeffrey T
Subject: RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
I was not stress the NIC/CPU, since I only send 290Kpps 400byte packets towards eth10. the CPU load almost 100%IDEL.
BTW, there are some problem with perf tool on 2.6.35.2, I will try to get you the top offenders if possible.
Thanks
WeiGu
-----Original Message-----
From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
Sent: Monday, April 11, 2011 11:43 PM
To: Wei Gu
Cc: Alexander Duyck; netdev; Kirsher, Jeffrey T
Subject: RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
Le lundi 11 avril 2011 à 23:14 +0800, Wei Gu a écrit :
> I tried the ixgbe-3.3.8 (insmod ixgbe.ko RSS=8,8,8,8,8,8,8,8 FdirMode=0,0,0,0,0,0,0,0 Node=0,0,1,1,2,2,3,3) from e1000.sf.net both on 2.6.35.1 and 2.6.35.2, same observation as 3.2.10 ixgbe driver, On 2.6.35.2 it have high rx errors:
> Ethtool -S eth10 |grep error
> rx_errors: 0
> tx_errors: 0
> rx_over_errors: 0
> rx_crc_errors: 0
> rx_frame_errors: 0
> rx_fifo_errors: 0
> rx_missed_errors: 2263088
> tx_aborted_errors: 0
> tx_carrier_errors: 0
> tx_fifo_errors: 0
> tx_heartbeat_errors: 0
> rx_long_length_errors: 0
> rx_short_length_errors: 0
> rx_csum_offload_errors: 0
> fcoe_last_errors: 0
>
It would be nice you post perf record / perf report results
During your stress , do
perf record -a -g sleep 10
perf report
And post "top offenders"
Thanks
^ permalink raw reply
* Re: Kernel panic when using bridge
From: Eric Dumazet @ 2011-04-12 4:22 UTC (permalink / raw)
To: Scot Doyle; +Cc: Stephen Hemminger, Hiroaki SHIMODA, netdev
In-Reply-To: <1302581384.3603.14.camel@edumazet-laptop>
Le mardi 12 avril 2011 à 06:09 +0200, Eric Dumazet a écrit :
> Le lundi 11 avril 2011 à 22:47 -0500, Scot Doyle a écrit :
> > On 04/11/2011 08:31 PM, Stephen Hemminger wrote:
> > >
> > > It would help if you gave a little more context (like diff -up)
> > > next time.
> > >
> > > I think the correct fix is for the skb handed to ip_compile_options
> > > to match the layout expected by ip_compile_options.
> > >
> > > This patch is compile tested only, please validate.
> > >
> > >
> > > Subject: [PATCH] bridge: set pseudo-route table before calling ip_comple_options
> > >
> > > For some ip options, ip_compile_options assumes it can find the associated
> > > route table. The bridge to iptables code doesn't supply the necessary
> > > reference causing NULL dereference.
> > >
> > > Signed-off-by: Stephen Hemminger<shemminger@vyatta.com>
> > >
> > > ---
> > > Patch against net-next-2.6, but if validated should go to net-2.6
> > > and stable.
> > >
> > > --- a/net/bridge/br_netfilter.c 2011-04-11 18:18:22.534837859 -0700
> > > +++ b/net/bridge/br_netfilter.c 2011-04-11 18:25:15.427244826 -0700
> > > @@ -221,6 +221,7 @@ static int br_parse_ip_options(struct sk
> > > struct ip_options *opt;
> > > struct iphdr *iph;
> > > struct net_device *dev = skb->dev;
> > > + struct rtable *rt;
> > > u32 len;
> > >
> > > iph = ip_hdr(skb);
> > > @@ -255,6 +256,14 @@ static int br_parse_ip_options(struct sk
> > > return 0;
> > > }
> > >
> > > + /* Associate bogus bridge route table */
> > > + rt = bridge_parent_rtable(dev);
> > > + if (!rt) {
> > > + kfree_skb(skb);
> > > + return 0;
> > > + }
> > > + skb_dst_set(skb,&rt->dst);
>
> Please try skb_dst_set_noref() here instead of skb_dst_set()
>
> Or increment rt refcount.
Also, I would first check if skb->dst already set to not leak a dst
if (!skb->dst) {
rt = bridge_parent_rtable(dev);
if (!rt) {
kfree_skb(skb);
return 0;
}
skb_dst_set_noref(skb,&rt->dst);
}
^ permalink raw reply
* Re: Kernel panic when using bridge
From: Eric Dumazet @ 2011-04-12 4:09 UTC (permalink / raw)
To: Scot Doyle; +Cc: Stephen Hemminger, Hiroaki SHIMODA, netdev
In-Reply-To: <4DA3CB4B.9090506@scotdoyle.com>
Le lundi 11 avril 2011 à 22:47 -0500, Scot Doyle a écrit :
> On 04/11/2011 08:31 PM, Stephen Hemminger wrote:
> >
> > It would help if you gave a little more context (like diff -up)
> > next time.
> >
> > I think the correct fix is for the skb handed to ip_compile_options
> > to match the layout expected by ip_compile_options.
> >
> > This patch is compile tested only, please validate.
> >
> >
> > Subject: [PATCH] bridge: set pseudo-route table before calling ip_comple_options
> >
> > For some ip options, ip_compile_options assumes it can find the associated
> > route table. The bridge to iptables code doesn't supply the necessary
> > reference causing NULL dereference.
> >
> > Signed-off-by: Stephen Hemminger<shemminger@vyatta.com>
> >
> > ---
> > Patch against net-next-2.6, but if validated should go to net-2.6
> > and stable.
> >
> > --- a/net/bridge/br_netfilter.c 2011-04-11 18:18:22.534837859 -0700
> > +++ b/net/bridge/br_netfilter.c 2011-04-11 18:25:15.427244826 -0700
> > @@ -221,6 +221,7 @@ static int br_parse_ip_options(struct sk
> > struct ip_options *opt;
> > struct iphdr *iph;
> > struct net_device *dev = skb->dev;
> > + struct rtable *rt;
> > u32 len;
> >
> > iph = ip_hdr(skb);
> > @@ -255,6 +256,14 @@ static int br_parse_ip_options(struct sk
> > return 0;
> > }
> >
> > + /* Associate bogus bridge route table */
> > + rt = bridge_parent_rtable(dev);
> > + if (!rt) {
> > + kfree_skb(skb);
> > + return 0;
> > + }
> > + skb_dst_set(skb,&rt->dst);
Please try skb_dst_set_noref() here instead of skb_dst_set()
Or increment rt refcount.
> > +
> > opt->optlen = iph->ihl*4 - sizeof(struct iphdr);
> > if (ip_options_compile(dev_net(dev), opt, skb))
> > goto inhdr_error;
> >
> >
> Thanks for the advice on diff context, I appreciate it. Here's the
> output from the patch:
>
> [ 422.577325] ------------[ cut here ]------------
> [ 422.581932] WARNING: at net/core/dst.c:278 dst_release+0x2e/0x5d()
> [ 422.588086] Hardware name: PowerEdge R510
> [ 422.592075] Modules linked in: kvm_intel kvm bridge stp loop snd_pcm
> snd_timer snd soundcore snd_page_alloc i7core_edac psmouse pcspkr
> edac_core evdev serio_raw power_meter processor ghes tpm_tis dcdbas tpm
> tpm_bios thermal_sys button hed ext2 mbcache dm_mod raid1 md_mod sd_mod
> crc_t10dif usb_storage uas uhci_hcd mpt2sas scsi_transport_sas igb
> ehci_hcd raid_class scsi_mod usbcore bnx2 dca [last unloaded:
> scsi_wait_scan]
> [ 422.629510] Pid: 0, comm: swapper Not tainted 2.6.39-rc2+ #10
> [ 422.635225] Call Trace:
> [ 422.637655] <IRQ> [<ffffffff81045635>] ? warn_slowpath_common+0x78/0x8c
> [ 422.644425] [<ffffffffa01cbe89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
> [ 422.650918] [<ffffffff8127cd60>] ? dst_release+0x2e/0x5d
> [ 422.656290] [<ffffffff8126c25f>] ? skb_release_head_state+0x21/0xeb
> [ 422.662613] [<ffffffffa01cbe89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
> [ 422.669108] [<ffffffff8126c06f>] ? __kfree_skb+0x9/0x77
> [ 422.674392] [<ffffffff812985f7>] ? nf_hook_slow+0x93/0x114
> [ 422.679936] [<ffffffffa01cbe89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
> [ 422.686431] [<ffffffffa01cbe89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
> [ 422.692927] [<ffffffffa01cbe6f>] ? NF_HOOK.clone.4+0x3c/0x56 [bridge]
> [ 422.699421] [<ffffffff812a7d8e>] ? tcp_gro_receive+0xa1/0x204
> [ 422.705225] [<ffffffffa01cc1e5>] ? br_handle_frame+0x195/0x1ac [bridge]
> [ 422.711892] [<ffffffffa01cc050>] ?
> br_handle_frame_finish+0x1c7/0x1c7 [bridge]
> [ 422.719166] [<ffffffff812764ef>] ? __netif_receive_skb+0x2a7/0x450
> [ 422.725401] [<ffffffff81276928>] ? netif_receive_skb+0x52/0x58
> [ 422.731289] [<ffffffff81276e2a>] ? napi_gro_receive+0x1f/0x2f
> [ 422.737091] [<ffffffff812769ff>] ? napi_skb_finish+0x1c/0x31
> [ 422.742809] [<ffffffffa0226fcd>] ? igb_poll+0x6d9/0x9ee [igb]
> [ 422.748615] [<ffffffffa003bde2>] ? scsi_run_queue+0x2ce/0x30a [scsi_mod]
> [ 422.755371] [<ffffffffa003cb31>] ? scsi_io_completion+0x44c/0x4cf
> [scsi_mod]
> [ 422.762472] [<ffffffff81276f55>] ? net_rx_action+0xa4/0x1b1
> [ 422.768103] [<ffffffff8104ad26>] ? __do_softirq+0xb8/0x176
> [ 422.773647] [<ffffffff81333c5c>] ? call_softirq+0x1c/0x30
> [ 422.779104] [<ffffffff8100aa57>] ? do_softirq+0x3f/0x84
> [ 422.784388] [<ffffffff8104af91>] ? irq_exit+0x3f/0x8f
> [ 422.789499] [<ffffffff8100a793>] ? do_IRQ+0x85/0x9e
> [ 422.794439] [<ffffffff8132cbd3>] ? common_interrupt+0x13/0x13
> [ 422.800240] <EOI> [<ffffffff81061348>] ? enqueue_hrtimer+0x3f/0x53
> [ 422.806575] [<ffffffffa0310417>] ? arch_local_irq_enable+0x7/0x8
> [processor]
> [ 422.813676] [<ffffffffa0310dab>] ? acpi_idle_enter_c1+0x86/0xa2
> [processor]
> [ 422.820690] [<ffffffff8125d05d>] ? cpuidle_idle_call+0xf4/0x17e
> [ 422.826664] [<ffffffff81008298>] ? cpu_idle+0xa2/0xc4
> [ 422.831776] [<ffffffff8169db60>] ? start_kernel+0x3b9/0x3c4
> [ 422.837406] [<ffffffff8169d3c6>] ? x86_64_start_kernel+0x102/0x10f
> [ 422.843640] ---[ end trace 5d4687f8472ee50c ]---
>
^ permalink raw reply
* Re: Kernel panic when using bridge
From: Scot Doyle @ 2011-04-12 3:47 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Hiroaki SHIMODA, netdev
In-Reply-To: <20110411183105.46e86684@nehalam>
On 04/11/2011 08:31 PM, Stephen Hemminger wrote:
>
> It would help if you gave a little more context (like diff -up)
> next time.
>
> I think the correct fix is for the skb handed to ip_compile_options
> to match the layout expected by ip_compile_options.
>
> This patch is compile tested only, please validate.
>
>
> Subject: [PATCH] bridge: set pseudo-route table before calling ip_comple_options
>
> For some ip options, ip_compile_options assumes it can find the associated
> route table. The bridge to iptables code doesn't supply the necessary
> reference causing NULL dereference.
>
> Signed-off-by: Stephen Hemminger<shemminger@vyatta.com>
>
> ---
> Patch against net-next-2.6, but if validated should go to net-2.6
> and stable.
>
> --- a/net/bridge/br_netfilter.c 2011-04-11 18:18:22.534837859 -0700
> +++ b/net/bridge/br_netfilter.c 2011-04-11 18:25:15.427244826 -0700
> @@ -221,6 +221,7 @@ static int br_parse_ip_options(struct sk
> struct ip_options *opt;
> struct iphdr *iph;
> struct net_device *dev = skb->dev;
> + struct rtable *rt;
> u32 len;
>
> iph = ip_hdr(skb);
> @@ -255,6 +256,14 @@ static int br_parse_ip_options(struct sk
> return 0;
> }
>
> + /* Associate bogus bridge route table */
> + rt = bridge_parent_rtable(dev);
> + if (!rt) {
> + kfree_skb(skb);
> + return 0;
> + }
> + skb_dst_set(skb,&rt->dst);
> +
> opt->optlen = iph->ihl*4 - sizeof(struct iphdr);
> if (ip_options_compile(dev_net(dev), opt, skb))
> goto inhdr_error;
>
>
Thanks for the advice on diff context, I appreciate it. Here's the
output from the patch:
[ 422.577325] ------------[ cut here ]------------
[ 422.581932] WARNING: at net/core/dst.c:278 dst_release+0x2e/0x5d()
[ 422.588086] Hardware name: PowerEdge R510
[ 422.592075] Modules linked in: kvm_intel kvm bridge stp loop snd_pcm
snd_timer snd soundcore snd_page_alloc i7core_edac psmouse pcspkr
edac_core evdev serio_raw power_meter processor ghes tpm_tis dcdbas tpm
tpm_bios thermal_sys button hed ext2 mbcache dm_mod raid1 md_mod sd_mod
crc_t10dif usb_storage uas uhci_hcd mpt2sas scsi_transport_sas igb
ehci_hcd raid_class scsi_mod usbcore bnx2 dca [last unloaded:
scsi_wait_scan]
[ 422.629510] Pid: 0, comm: swapper Not tainted 2.6.39-rc2+ #10
[ 422.635225] Call Trace:
[ 422.637655] <IRQ> [<ffffffff81045635>] ? warn_slowpath_common+0x78/0x8c
[ 422.644425] [<ffffffffa01cbe89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
[ 422.650918] [<ffffffff8127cd60>] ? dst_release+0x2e/0x5d
[ 422.656290] [<ffffffff8126c25f>] ? skb_release_head_state+0x21/0xeb
[ 422.662613] [<ffffffffa01cbe89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
[ 422.669108] [<ffffffff8126c06f>] ? __kfree_skb+0x9/0x77
[ 422.674392] [<ffffffff812985f7>] ? nf_hook_slow+0x93/0x114
[ 422.679936] [<ffffffffa01cbe89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
[ 422.686431] [<ffffffffa01cbe89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
[ 422.692927] [<ffffffffa01cbe6f>] ? NF_HOOK.clone.4+0x3c/0x56 [bridge]
[ 422.699421] [<ffffffff812a7d8e>] ? tcp_gro_receive+0xa1/0x204
[ 422.705225] [<ffffffffa01cc1e5>] ? br_handle_frame+0x195/0x1ac [bridge]
[ 422.711892] [<ffffffffa01cc050>] ?
br_handle_frame_finish+0x1c7/0x1c7 [bridge]
[ 422.719166] [<ffffffff812764ef>] ? __netif_receive_skb+0x2a7/0x450
[ 422.725401] [<ffffffff81276928>] ? netif_receive_skb+0x52/0x58
[ 422.731289] [<ffffffff81276e2a>] ? napi_gro_receive+0x1f/0x2f
[ 422.737091] [<ffffffff812769ff>] ? napi_skb_finish+0x1c/0x31
[ 422.742809] [<ffffffffa0226fcd>] ? igb_poll+0x6d9/0x9ee [igb]
[ 422.748615] [<ffffffffa003bde2>] ? scsi_run_queue+0x2ce/0x30a [scsi_mod]
[ 422.755371] [<ffffffffa003cb31>] ? scsi_io_completion+0x44c/0x4cf
[scsi_mod]
[ 422.762472] [<ffffffff81276f55>] ? net_rx_action+0xa4/0x1b1
[ 422.768103] [<ffffffff8104ad26>] ? __do_softirq+0xb8/0x176
[ 422.773647] [<ffffffff81333c5c>] ? call_softirq+0x1c/0x30
[ 422.779104] [<ffffffff8100aa57>] ? do_softirq+0x3f/0x84
[ 422.784388] [<ffffffff8104af91>] ? irq_exit+0x3f/0x8f
[ 422.789499] [<ffffffff8100a793>] ? do_IRQ+0x85/0x9e
[ 422.794439] [<ffffffff8132cbd3>] ? common_interrupt+0x13/0x13
[ 422.800240] <EOI> [<ffffffff81061348>] ? enqueue_hrtimer+0x3f/0x53
[ 422.806575] [<ffffffffa0310417>] ? arch_local_irq_enable+0x7/0x8
[processor]
[ 422.813676] [<ffffffffa0310dab>] ? acpi_idle_enter_c1+0x86/0xa2
[processor]
[ 422.820690] [<ffffffff8125d05d>] ? cpuidle_idle_call+0xf4/0x17e
[ 422.826664] [<ffffffff81008298>] ? cpu_idle+0xa2/0xc4
[ 422.831776] [<ffffffff8169db60>] ? start_kernel+0x3b9/0x3c4
[ 422.837406] [<ffffffff8169d3c6>] ? x86_64_start_kernel+0x102/0x10f
[ 422.843640] ---[ end trace 5d4687f8472ee50c ]---
^ permalink raw reply
* Re: [Bugme-new] [Bug 33042] New: Marvell 88E1145 phy configured incorrectly in fiber mode
From: Alex Dubov @ 2011-04-12 3:45 UTC (permalink / raw)
To: Andrew Morton, David Daney
Cc: netdev, bugzilla-daemon, bugme-daemon, Grant Likely, Andy Fleming
In-Reply-To: <4DA3703B.1090802@caviumnetworks.com>
>
> How does your u-boot configure the part? Does it
> write any of the
> configuration registers, or is it just the default
> configuration set via
> the strapping pins?
U-boot configures this phy just like any other phy - by running a set of
register assignments from phy_info_M88E1145.
Unfortunately, I don't have a datasheet for this phy and kernel does
quite a few things differently, so simply copying stuff from u-boot
does not work well (in kernel, phy initialization is broken into 3
functions, if I'm not mistaken).
Otherwise, my problem seems to be identical to the one reported some
time ago against 88E1111 phy (which resulted in the addition of
"marvell_read_status" in the first place). The problem was, as it seems
to be now, that phy is always configured in "copper" mode, instead of
driver checking for the correct "fiber" mode bits.
>
> In any event, you will probably have to read the
> configuration before
> the drivers/net/phy/marvel.c changes them. Then
> compare that to what
> the driver is trying to set. Then you will either
> have to override the
> configuration with the device tree "marvell,reg-init"
> property, or if
> you are not using the device tree, add a 88e1145 specific
> flag that you
> set when calling phy_connect().
>
> David Daney
>
^ permalink raw reply
* Re: [PATCH RESEND] uts: Set default hostname to "localhost", rather than "(none)"
From: Valdis.Kletnieks @ 2011-04-12 2:47 UTC (permalink / raw)
To: Josh Triplett
Cc: David Miller, netdev, Serge E. Hallyn, Andrew Morton,
Linus Torvalds, linux-kernel
In-Reply-To: <20110411050155.GA2507@feather>
[-- Attachment #1: Type: text/plain, Size: 630 bytes --]
On Sun, 10 Apr 2011 22:01:59 PDT, Josh Triplett said:
> Change the default hostname to "localhost". This removes the need for
> the standard fallback, provides a useful default for systems that never
> call sethostname, and makes minimal systems that much more useful with
> less configuration.
Seems sane enough to me. Only possible objection I can think of is "if you're running
with 'init=/bin/sh' or similar config too crippled to run /bin/hostname, maybe your
network config *should* be intentionally toasted so you can't get further surprises".
I personally don't agree - just saying somebody might hold that position.
[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]
^ permalink raw reply
* RE: [PATCH v2] net: r8169: convert to hw_features
From: hayeswang @ 2011-04-12 2:10 UTC (permalink / raw)
To: 'François Romieu',
'Michał Mirosław'
Cc: netdev, 'David Dillow'
In-Reply-To: <20110411184739.GA17331@electric-eye.fr.zoreil.com>
> From: François Romieu [mailto:romieu@fr.zoreil.com]
> Sent: Tuesday, April 12, 2011 2:48 AM
> To: Michał Mirosław
> Cc: netdev@vger.kernel.org; David Dillow; Hayeswang
> Subject: Re: [PATCH v2] net: r8169: convert to hw_features
>
>
> Hayes, I have a 8168c manual at hand. Do all 8168 have the
> same Tx descriptors layout ?
>
Yes, all 8168 have the same Tx descriptors layout except for 8168B series.
Best Regards,
Hayes
^ permalink raw reply
* Re: [Bugme-new] [Bug 32872] New: LLC PDU is dropped if skb is not linear
From: David Miller @ 2011-04-12 1:56 UTC (permalink / raw)
To: akpm; +Cc: vitalyb, bugzilla-daemon, bugme-daemon, netdev
In-Reply-To: <20110411164812.8f84f995.akpm@linux-foundation.org>
From: Andrew Morton <akpm@linux-foundation.org>
Date: Mon, 11 Apr 2011 16:48:12 -0700
> --- linux-2.6.32.36/net/llc/llc_input.c.orig 2009-12-03 05:51:21.000000000 +0200
> +++ linux-2.6.32.36/net/llc/llc_input.c 2011-04-08 08:57:29.000000000 +0300
> @@ -105,6 +105,11 @@
> if (unlikely(!pskb_may_pull(skb, sizeof(*pdu))))
> return 0;
>
> + if (skb->data_len != 0){
> + if (unlikely(skb_linearize(skb)))
> + return 0;
> + }
> +
> pdu = (struct llc_pdu_un *)skb->data;
> if ((pdu->ctrl_1 & LLC_PDU_TYPE_MASK) == LLC_PDU_TYPE_U)
> llc_len = 1;
>
>
> 2.6.32 is a pretty old kernel - we'll need to verify if current kernels
> have the same problem.
>
> Please don't send patches via bugzilla - it causes lots of problems
> with our usual patch management and review processes. It's preferred
> that patches be sent via email as per Documentation/SubmittingPatches,
> and that they include a Signed-off-by:, as described in that file.
The skb_tail_pointer() check in llc_fixup_skb() is beyond wonky and
honestly the source of the problems here.
I'd suggest instead:
diff --git a/net/llc/llc_input.c b/net/llc/llc_input.c
index 058f1e9..9032421 100644
--- a/net/llc/llc_input.c
+++ b/net/llc/llc_input.c
@@ -121,8 +121,7 @@ static inline int llc_fixup_skb(struct sk_buff *skb)
s32 data_size = ntohs(pdulen) - llc_len;
if (data_size < 0 ||
- ((skb_tail_pointer(skb) -
- (u8 *)pdu) - llc_len) < data_size)
+ !pskb_may_pull(skb, data_size))
return 0;
if (unlikely(pskb_trim_rcsum(skb, data_size)))
return 0;
^ permalink raw reply related
* Re: Kernel panic when using bridge
From: Stephen Hemminger @ 2011-04-12 1:31 UTC (permalink / raw)
To: Scot Doyle; +Cc: Hiroaki SHIMODA, netdev, Sebastian Nickel, Pallai Roland
In-Reply-To: <4DA39330.2030102@scotdoyle.com>
On Mon, 11 Apr 2011 18:48:00 -0500
Scot Doyle <lkml@scotdoyle.com> wrote:
> On 04/09/2011 02:19 AM, Hiroaki SHIMODA wrote:
> >
> > It seems that the bug trap is occurred in ip_options_compile() due to
> > rt is NULL.
> >
> > 8b 96 cc 00 00 00 mov 0xcc(%rsi),%edx
> > rsi is rt, and 0xcc means rt->rt_spec_dst. So I think below code hit
> > the bug trap.
> >
> > 332 if (skb) {
> > 333 memcpy(&optptr[optptr[2]-1],&rt->rt_spec_dst, 4);<- here
> > 334 opt->is_changed = 1;
> > 335 }
> >
> > And call trace seems as follows.
> > __netif_receive_skb()
> > -> br_handle_frame()
> > -> NF_HOOK()
> > -> br_nf_pre_routing()
> > -> br_parse_ip_options()
> > -> ip_options_compile()
> >
> > br_parse_ip_options() was introduced at 462fb2a (bridge : Sanitize
> > skb before it enters the IP stack) but ip_options_compile() or
> > ip_options_rcv_srr() seems to be called with no rt info.
>
> Thanks to a tip from Sebastian, I can now reproduce this panic by
> running "IP Stack Integrity Checker v0.07" from another machine on the
> same subnet with command "icmpsic -s x.y.z.a -d x.y.z.b" where "x.y.z.a"
> is IP address of the other machine and "x.y.z.b" is the IP address of
> the target. When I enable iptables logging on the target machine, no
> panic occurs. When I disable iptables logging (but otherwise leave the
> same iptables rules) a panic occurs within a few seconds.
>
> Thanks Hiroaki for the analysis of the kernel panic output. I've
> confirmed that you are correct by placing a printk just before those two
> lines. In every panic, the printk was triggered on line 333 of
> net/ipv4/ip_options.c
>
> The kernel panic does not occur after applying the following patch.
>
> # diff net/ipv4/ip_options.c.original net/ipv4/ip_options.c.fix
> 332c332
> < if (skb) {
> ---
> > if (skb && rt) {
> 374c374
> < if (skb) {
> ---
> > if (skb && rt) {
>
> What do you all think? Will it cause other problems?
It would help if you gave a little more context (like diff -up)
next time.
I think the correct fix is for the skb handed to ip_compile_options
to match the layout expected by ip_compile_options.
This patch is compile tested only, please validate.
Subject: [PATCH] bridge: set pseudo-route table before calling ip_comple_options
For some ip options, ip_compile_options assumes it can find the associated
route table. The bridge to iptables code doesn't supply the necessary
reference causing NULL dereference.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
---
Patch against net-next-2.6, but if validated should go to net-2.6
and stable.
--- a/net/bridge/br_netfilter.c 2011-04-11 18:18:22.534837859 -0700
+++ b/net/bridge/br_netfilter.c 2011-04-11 18:25:15.427244826 -0700
@@ -221,6 +221,7 @@ static int br_parse_ip_options(struct sk
struct ip_options *opt;
struct iphdr *iph;
struct net_device *dev = skb->dev;
+ struct rtable *rt;
u32 len;
iph = ip_hdr(skb);
@@ -255,6 +256,14 @@ static int br_parse_ip_options(struct sk
return 0;
}
+ /* Associate bogus bridge route table */
+ rt = bridge_parent_rtable(dev);
+ if (!rt) {
+ kfree_skb(skb);
+ return 0;
+ }
+ skb_dst_set(skb, &rt->dst);
+
opt->optlen = iph->ihl*4 - sizeof(struct iphdr);
if (ip_options_compile(dev_net(dev), opt, skb))
goto inhdr_error;
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox