* Re: [PATCH] mwl8k: Add 0x2a02 PCI device-id (Marvell 88W8361)
From: Sedat Dilek @ 2012-04-28 23:26 UTC (permalink / raw)
To: Jim Cromie
Cc: Lennert Buytenhek, John W. Linville,
linux-wireless-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA, lautriv
In-Reply-To: <CAJfuBxwz2PmNY37VB7Wx+GdQcOcS6Y96EM_JNtvs-dyv-imL6g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On Sun, Apr 29, 2012 at 1:11 AM, Jim Cromie <jim.cromie-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Sat, Apr 28, 2012 at 4:49 PM, Sedat Dilek <sedat.dilek@googlemail.com> wrote:
>> On Sun, Apr 29, 2012 at 12:36 AM, Jim Cromie <jim.cromie-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
>>>>> As already pointed out, no Marwell WLAN hardware here. Marvell comics
>>>>> of course :-).
>
> (I had to leave that one..:-)
>
>
>
>>>>
>>>> A new tarball from lautriv with same outputs as before, but now tested
>>>> with Linux-3.4-rc4.
>>>>
>>>> - Sedat -
>>>
>>>
>>> heres my logs, using firmware extracted by Sedat's script,
>>> and the patch on mwl8k.c
>>>
>>> bottom-line, it appears to be working.
>>>
>>> its contents are a bit more pedantic, and includes data for
>>> another wifi card (rtl8180 based) also in the box.
>>> It was obtained by this script:
>>>
>>> #!/bin/bash
>>>
>>> # dmesg (Linux-3.3.3)
>>> # e_n_a (/etc/network/interfaces)
>>> # ifconfig output
>>> # iwconfig output
>>> # iw_phy output
>>> # ps_axu (WPA) output
>>>
>>> devs="wlan0 wlan1"
>>> apmac=00:14:d1:e8:65:0a
>>>
>>> loudly () {
>>> echo "# $@"
>>> fname=`echo $@ | sed -e 's/ /-/g'`
>>> $@ 2> $fname-err | tee $fname
>>> [ $? != 0 ] && echo non-zero exit on $fname: $?
>>> [ -s $fname-err ] || rm $fname-err
>>> }
>>>
>>> ( iw --debug event -f > iw-event-f )&
>>> pid_event=$!
>>>
>>> for N in 0 1 ; do
>>> loudly iw dev wlan$N interface add fish$N type monitor # flags none
>>> loudly iw dev fish$N set channel 8
>>> loudly ifconfig fish$N up
>>> ( tcpdump -i fish$N -s 65000 -p -U -w fish$N.dump )&
>>> pid_dump_fish$N=$!
>>> done
>>>
>>> loudly iw list
>>>
>>> #loudly iwspy
>>> # gives: Interface doesn't support wireless statistic collection
>>>
>>> for dev in $devs ; do
>>> loudly ifconfig $dev
>>> loudly iwconfig $dev
>>> loudly iwlist $dev scan
>>> loudly iw dev $dev info
>>> loudly iw dev $dev link
>>> loudly iw dev $dev scan
>>> loudly iw dev $dev survey dump
>>> done
>>>
>>> for phy in $phys ; do
>>> loudly iw phy $phy info
>>> done
>>>
>>> # these are unsupported on wlan0
>>> loudly iw dev wlan1 survey dump
>>> loudly iw dev wlan1 station dump
>>> loudly iw dev wlan1 station get $apmac
>>>
>>>
>>> for N in 0 1 ; do
>>> loudly iw dev fish$N del
>>> done
>>>
>>> kill $pid_dump_fish0 $pid_dump_fish0
>>> kill $pid_event
>>>
>>> dmesg > dmesg
>>>
>>> grep -vE '^#|key' /etc/network/interfaces > e_n_a
>>>
>>> exit
>>
>> Hi Jim,
>>
>> thanks for your testing and the nice testcase-script!
>>
>> lautriv you wanna run some more tests with Jim's script?
>>
>> Jim, how stable/fast/reliable is your WLAN connection?
>> Suspend/resume tested?
>
> I havent tested reliability in any way.
> in fact, I havent tested any data-xfer per se,
> will do an iperf test soon.
>
> That said, bitrate is quite low, I havent looked at why.
>
> jimc@chumly:~/projects/lx/wifi/mwl8k-8361p-logs$ grep -i MBit *
> iw-dev-wlan0-link: tx bitrate: 11.0 MBit/s
> iw-dev-wlan1-link: tx bitrate: 1.0 MBit/s
> iw-dev-wlan1-station-dump: tx bitrate: 1.0 MBit/s
> iw-dev-wlan1-station-get-00:14:d1:e8:65:0a: tx bitrate: 1.0 MBit/s
>
> my laptop is much faster than both cards in the soekris box, to same AP
>
> Connected to 00:14:d1:e8:65:0a (on wlan0)
> SSID: yoduh
> freq: 2447
> RX: 191134302 bytes (2120068 packets)
> TX: 17440426 bytes (120666 packets)
> signal: -45 dBm
> tx bitrate: 54.0 MBit/s
>
> bss flags:
> dtim period: 0
> beacon int: 100
>
>
> If you all have some suggestions on this, Id like to hear them.
> And of course, any other testing you'd like too.
>
>
>>
>> Hope this helps to get native Linux support for 8361p.
>
> hear hear.
> FWIW, I pulled this card out of a dead Netgear WNR854T,
> which is linux based (and GPL compliant)
>
>> Regards,
>> - Sedat -
>>
>> P.S.: BTW, only to clarify it should be "e_n_i" as short-form for
>> /etc/network/interfaces file, but e_n_a sounds more female and nicer
>> :-).
>
> I caught that, but it wasnt worth "correcting" ;-)
>
> thanks
> Jim
Unfortunately, [1] says not much about debugging.
Anyway, Lennert has some new informations.
Let's see what the experts will say.
- Sedat -
[1] http://wireless.kernel.org/en/users/Drivers/mwl8k
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 1/2] AT91: Remove fixed mapping for AT91RM9200 ethernet
From: David Miller @ 2012-04-29 1:49 UTC (permalink / raw)
To: nicolas.ferre
Cc: netdev, linux-arm-kernel, linux-kernel, avictor.za, plagnioj
In-Reply-To: <3ad9b1273c4f30d42dcc3bdf6d03470111246f14.1335435759.git.nicolas.ferre@atmel.com>
From: Nicolas Ferre <nicolas.ferre@atmel.com>
Date: Thu, 26 Apr 2012 12:30:42 +0200
> From: Andrew Victor <avictor.za@gmail.com>
>
> The AT91RM9200 Ethernet controller still has a fixed IO mapping.
> So:
> * Remove the fixed IO mapping and AT91_VA_BASE_EMAC definition.
> * Pass the physical base-address via platform-resources to the driver.
> * Convert at91_ether.c driver to perform an ioremap().
> * Ethernet PHY detection needs to be performed during the driver
> initialization process, it can no longer be done first.
>
> Signed-off-by: Andrew Victor <linux@maxim.org.za>
> Signed-off-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Applied to net-next.
^ permalink raw reply
* Re: [PATCH 2/2] net/at91_ether: use gpio_to_irq for phy IRQ line
From: David Miller @ 2012-04-29 1:49 UTC (permalink / raw)
To: nicolas.ferre
Cc: netdev, linux-arm-kernel, linux-kernel, avictor.za, plagnioj
In-Reply-To: <cfc991548b5b73c8038f41877da1cd0104945cf5.1335435759.git.nicolas.ferre@atmel.com>
From: Nicolas Ferre <nicolas.ferre@atmel.com>
Date: Thu, 26 Apr 2012 12:30:43 +0200
> Use the gpio_to_irq() function to retrieve the phy IRQ line
> from the GPIO pin specification.
> This fix is needed now that we have moved to irqdomains on AT91.
>
> Reported-by: Jamie Iles <jamie@jamieiles.com>
> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Applied to net-next.
^ permalink raw reply
* Re: [PATCH] net: smsc75xx: fix MDIO access
From: David Miller @ 2012-04-29 1:55 UTC (permalink / raw)
To: fillods; +Cc: netdev, steve.glendinning
In-Reply-To: <20120426195517.GU5277@charybde.local>
From: Stephane Fillod <fillods@users.sf.net>
Date: Thu, 26 Apr 2012 21:55:17 +0200
> * make MDIO read/write to work, the MII_ACCESS_BUSY bit was missing
> to actually trigger the I/O. Rem: the smsc75xx is different from
> the smsc95xx in that regard.
> * fix PHY interrupt acknowledge, which needs a mdio_write
> to clear PHY_INT_SRC instead of a usual read like in smsc95xx.
> * fix bug in phy_init loop that was ignoring BMCR reset bit,
> akin to smsc95xx's d946092000698fd204d82a9d239103c656fb63bf
> * mark link down on startup and let PHY interrupt deal with carrier
> changes, akin to 07d69d4238418746a7b85c5d05ec17c658a2a390
> * declare the smsc75xx's MII as GMII capable
>
> Tested on ARM/Omap3 with LAN7500-CEB.
>
> Signed-off-by: Stephane Fillod <fillods@users.sf.net>
> Cc: Steve Glendinning <steve.glendinning@shawell.net>
Steve, could you give this a quick review?
Thanks.
^ permalink raw reply
* Re: [PATCH net-next 1/2] drivers/net/oki-semi: Remove the definition of PCH_GBE_ETH_ALEN
From: David Miller @ 2012-04-29 2:00 UTC (permalink / raw)
To: roy.qing.li; +Cc: netdev
In-Reply-To: <1335510073-14018-1-git-send-email-roy.qing.li@gmail.com>
From: roy.qing.li@gmail.com
Date: Fri, 27 Apr 2012 15:01:12 +0800
> From: RongQing.Li <roy.qing.li@gmail.com>
>
> PCH_GBE_ETH_ALEN is equal to ETH_ALEN, so we can replace it with
> ETH_ALEN.
>
> If they are not equal, it must be a bug, since this is ethernet,
> and the address has been already stored to mc_addr_list as ETH_ALEN
> bytes when call pch_gbe_mac_mc_addr_list_update.
>
> Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next 2/2] drivers/net/oki-semi: Donot recompute IP header checksum
From: David Miller @ 2012-04-29 2:00 UTC (permalink / raw)
To: roy.qing.li; +Cc: netdev
In-Reply-To: <1335510073-14018-2-git-send-email-roy.qing.li@gmail.com>
From: roy.qing.li@gmail.com
Date: Fri, 27 Apr 2012 15:01:13 +0800
> From: RongQing.Li <roy.qing.li@gmail.com>
>
> If I understand correct, NETIF_F_IP_CSUM only means the hardware
> will compute the TCP/UDP checksum, IP checksum is always computed
> in software
>
> So as a workround of hardware unable to compute small packages
> checksum, do not need to compute IP header checksum.
>
> Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Applied.
^ permalink raw reply
* Re: [net-next 0/6][pull request] Intel Wired LAN Driver Updates
From: David Miller @ 2012-04-29 2:08 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, gospo, sassmann
In-Reply-To: <1335519413-1892-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Fri, 27 Apr 2012 02:36:47 -0700
> This series of patches contains updates for e1000e, igb and
> ixgbe.
>
> The following are changes since commit 82981930125abfd39d7c8378a9cfdf5e1be2002b:
> net: cleanups in sock_setsockopt()
> and are available in the git repository at:
> git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master
Pulled.
I super appreciate the descriptive commit messages, especialy
wrt. hardware workarounds.
Thanks.
^ permalink raw reply
* Re: [PATCH] usbnet: fix skb traversing races during unlink
From: David Miller @ 2012-04-29 2:18 UTC (permalink / raw)
To: tom.leiming; +Cc: gregkh, netdev, linux-usb, huajun.li.lee, oneukum, stable
In-Reply-To: <1335522095-9012-1-git-send-email-tom.leiming@gmail.com>
From: Ming Lei <tom.leiming@gmail.com>
Date: Fri, 27 Apr 2012 18:21:35 +0800
> +/*The caller must hold list->lock*/
Please put spaces in your comments, like this:
/* The caller must hold list->lock */
> +
> + /*speedup unlink by blocking resubmit*/
Same here.
>
> - entry = (struct skb_data *) skb->cb;
> + skb_queue_walk(q, skb) {
> + entry = (struct skb_data *) skb->cb;
> + if (entry->state != unlink_start)
> + break;
> + }
> + if (skb == (struct sk_buff *)q)
> + break;
Please do not expose the internal details of SKB lists
with a test like this. Eventually this will all be
converted to struct list_head and this kind of test
will cause unnecessary pain for such a conversion.
Instead, code it like this, as you would for a loop
using list_for_each*() or similar:
skb_queue_walk(q, skb) {
if (condition)
goto found;
}
/* No matching entry. */
break;
found:
Thanks.
^ permalink raw reply
* Re: [PATCH v2 1/4 net-next] net/ipv6/udp: UDP encapsulation: break backlog_rcv into __udpv6_queue_rcv_skb
From: David Miller @ 2012-04-29 2:31 UTC (permalink / raw)
To: bcrl; +Cc: jchapman, netdev
In-Reply-To: <20120427182321.GE17021@kvack.org>
From: Benjamin LaHaise <bcrl@kvack.org>
Date: Fri, 27 Apr 2012 14:23:21 -0400
> This is the first step in reworking the IPv6 UDP code to be structured more
> like the IPv4 UDP code. This patch creates __udpv6_queue_rcv_skb() with
> the equivalent sematics to __udp_queue_rcv_skb(), and wires it up to the
> backlog_rcv method.
>
> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Applied.
^ permalink raw reply
* Re: [PATCH v2 2/4 net-next] net/ipv6/udp: UDP encapsulation: move socket locking into udpv6_queue_rcv_skb()
From: David Miller @ 2012-04-29 2:31 UTC (permalink / raw)
To: bcrl; +Cc: jchapman, netdev
In-Reply-To: <20120427182358.GF17021@kvack.org>
From: Benjamin LaHaise <bcrl@kvack.org>
Date: Fri, 27 Apr 2012 14:23:59 -0400
> In order to make sure that when the encap_rcv() hook is introduced it is
> not called with the socket lock held, move socket locking from callers into
> udpv6_queue_rcv_skb(), matching what happens in IPv4.
>
> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Applied.
^ permalink raw reply
* Re: [PATCH v2 3/4 net-next] net/ipv6/udp: UDP encapsulation: introduce encap_rcv hook into IPv6
From: David Miller @ 2012-04-29 2:31 UTC (permalink / raw)
To: bcrl; +Cc: jchapman, netdev
In-Reply-To: <20120427182408.GG17021@kvack.org>
From: Benjamin LaHaise <bcrl@kvack.org>
Date: Fri, 27 Apr 2012 14:24:08 -0400
> Now that the sematics of udpv6_queue_rcv_skb() match IPv4's
> udp_queue_rcv_skb(), introduce the UDP encap_rcv() hook for IPv6.
>
> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Applied.
^ permalink raw reply
* Re: [PATCH v2 4/4 net-next] net/l2tp: add support for L2TP over IPv6 UDP
From: David Miller @ 2012-04-29 2:31 UTC (permalink / raw)
To: bcrl; +Cc: jchapman, netdev
In-Reply-To: <20120427182418.GH17021@kvack.org>
From: Benjamin LaHaise <bcrl@kvack.org>
Date: Fri, 27 Apr 2012 14:24:18 -0400
> Now that encap_rcv() works on IPv6 UDP sockets, wire L2TP up to IPv6.
> Support has been tested with and without hardware offloading. This
> version fixes the L2TP over localhost issue with incorrect checksums
> being reported.
>
> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Applied.
^ permalink raw reply
* Re: [PATCH] 9p: disconnect channel when PCI device is removed
From: Sasha Levin @ 2012-04-29 3:21 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: davem, ericvh, jvrao, rusty, netdev, linux-kernel, davej
In-Reply-To: <87d36rafid.fsf@skywalker.in.ibm.com>
On Sat, Apr 28, 2012 at 8:16 PM, Aneesh Kumar K.V
<aneesh.kumar@linux.vnet.ibm.com> wrote:
> Sasha Levin <levinsasha928@gmail.com> writes:
>
>> On Mon, Apr 16, 2012 at 12:59 PM, Aneesh Kumar K.V
>> <aneesh.kumar@linux.vnet.ibm.com> wrote:
>>> Sasha Levin <levinsasha928@gmail.com> writes:
>>>
>>>> On Sun, Apr 15, 2012 at 2:27 PM, Aneesh Kumar K.V
>>>> <aneesh.kumar@linux.vnet.ibm.com> wrote:
>>>>> Sasha Levin <levinsasha928@gmail.com> writes:
>>>>>
>>>>>> When a virtio_9p pci device is being removed, we should close down any
>>>>>> active channels and free up resources, we're not supposed to BUG() if there's
>>>>>> still an open channel since it's a valid case when removing the PCI device.
>>>>>>
>>>>>> Otherwise, removing the PCI device with an open channel would cause the
>>>>>> following BUG():
>>>>>>
>>>>> ...
>>>>>
>>>>>> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
>>>>>> index 3d43206..5af18d1 100644
>>>>>> --- a/net/9p/trans_virtio.c
>>>>>> +++ b/net/9p/trans_virtio.c
>>>>>> @@ -615,7 +615,8 @@ static void p9_virtio_remove(struct virtio_device *vdev)
>>>>>> {
>>>>>> struct virtio_chan *chan = vdev->priv;
>>>>>>
>>>>>> - BUG_ON(chan->inuse);
>>>>>> + if (chan->inuse)
>>>>>> + p9_virtio_close(chan->client);
>>>>>> vdev->config->del_vqs(vdev);
>>>>>>
>>>>>> mutex_lock(&virtio_9p_lock);
>>>>>
>>>>> But an umount should have resulted in p9_virtio_close ? How are you
>>>>> removing the device ? Are you removing the device with file system
>>>>> mounted ?. In that case may be we should return EBUSY ?
>>>>
>>>> I signal the underlying PCI device to remove (echo 1 >
>>>> /sys/devices/pci0000\:00/[...]/remove), we can't really prevent that
>>>> thing so we must clean up ourselves.
>>>
>>> What does that mean for the mounted file system ? What would happen to
>>> the pending fs operations in that case ?
>>
>> I'm guessing that all of them should be canceled.
>
> Pending operation we can cancel, but what about dirty pages in cached
> mode ? Also how does virtio-blk handle this ?
virtio-blk does pretty much the same thing:
sh-4.2# echo 1 > 0000\:00\:05.0/remove
sh-4.2# ls -al
[hang]
>> virtio-pci simulates
>> a PCI device, if the PCI device is unplugged there's not much to do
>> about the filesystem or pending requests.
>
> Ideal thing to do would be to make remove return -EBUSY; let the user
> umount. But looking at the code, I guess there is no easy way to return
> error from remove callback ?
There's no way to return error, the removal callback is just a notifier.
^ permalink raw reply
* Re: "martian source" mixes up source and destination
From: Michael Tokarev @ 2012-04-29 4:52 UTC (permalink / raw)
To: Mantas M.; +Cc: linux-kernel, netdev
In-Reply-To: <jnh9tl$rth$1@dough.gmane.org>
On 28.04.2012 21:37, Mantas M. wrote:
> After enabling the /proc/sys/net/ipv4/conf/all/log_martians sysctl, I've
> been receiving the following messages on dmesg:
Cc'ing netdev@vger for network-related question.
>> martian source 255.255.255.255 from 192.168.1.115, on dev wlan0
>> ll header: ff:ff:ff:ff:ff:ff:f4:6d:04:85:fc:d7:08:00
>> martian source 192.168.1.255 from 192.168.1.115, on dev wlan0
>> ll header: ff:ff:ff:ff:ff:ff:f4:6d:04:85:fc:d7:08:00
>> martian source 192.168.1.255 from 192.168.1.115, on dev wlan0
>> ll header: ff:ff:ff:ff:ff:ff:f4:6d:04:85:fc:d7:08:00
>> <later>
>> martian source 255.255.255.255 from 192.168.1.183, on dev eth0
>> ll header: ff:ff:ff:ff:ff:ff:48:5d:60:e8:65:8f:08:00
It has been this way since very long time. The meaning of this
message is:
there is a martian source sending packets to 255.255.255.255
from 192.168.1.183 on dev eth0
> However, the packets are *not* martians, but legitimate broadcast
> UDP/IPv4 packets: they have the *destination* as 255.255.255.255 or
> 192.168.1.255, with the source being 192.168.1.115 (wlan0 on this
> device) for the first three packets, and 192.168.1.183 (eth0) on the
> fourth.
log_martians does not check destination address, it checks _source_
address. At least it did so in the past. Please show output of
`ip route' from your machine.
> Linux 3.3.4-1-ARCH on Arch Linux
Thanks,
/mjt
^ permalink raw reply
* Re: kernel panic, inconsistent lock, vanilla 3.3.4, 32bit, tcp
From: Eric Dumazet @ 2012-04-29 7:32 UTC (permalink / raw)
To: Denys Fedoryshchenko; +Cc: netdev, Glauber Costa
In-Reply-To: <04009a9cbb27bf97cb7d86612191b751@visp.net.lb>
On Sun, 2012-04-29 at 00:45 +0300, Denys Fedoryshchenko wrote:
> Hi
>
> Just got repeatable kernel panic on loaded proxy server
>
> Here is the log:
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.903868]
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.903880]
> =================================
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.903890] [ INFO: inconsistent
> lock state ]
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.903903] 3.3.4-build-0061 #8
> Not tainted
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904133]
> ---------------------------------
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904344] inconsistent
> {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] squid/1603
> [HC0[0]:SC0[0]:HE1:SE1] takes:
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] (key#3){+.?...}, at:
> [<c0232cc4>] __percpu_counter_sum+0xd/0x58
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] {IN-SOFTIRQ-W} state
> was registered at:
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0158b84>]
> __lock_acquire+0x284/0xc26
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c01598e8>]
> lock_acquire+0x71/0x85
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0349765>]
> _raw_spin_lock+0x33/0x40
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0232c93>]
> __percpu_counter_add+0x58/0x7c
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02cfde1>]
> sk_clone_lock+0x1e5/0x200
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0303ee4>]
> inet_csk_clone_lock+0xe/0x78
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0315778>]
> tcp_create_openreq_child+0x1b/0x404
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c031339c>]
> tcp_v4_syn_recv_sock+0x32/0x1c1
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c031615a>]
> tcp_check_req+0x1fd/0x2d7
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0313f77>]
> tcp_v4_do_rcv+0xab/0x194
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c03153bb>]
> tcp_v4_rcv+0x3b3/0x5cc
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02fc0c4>]
> ip_local_deliver_finish+0x13a/0x1e9
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02fc539>]
> NF_HOOK.clone.11+0x46/0x4d
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02fc652>]
> ip_local_deliver+0x41/0x45
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02fc4d1>]
> ip_rcv_finish+0x31a/0x33c
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02fc539>]
> NF_HOOK.clone.11+0x46/0x4d
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02fc857>]
> ip_rcv+0x201/0x23e
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02daa3a>]
> __netif_receive_skb+0x319/0x368
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02dac07>]
> netif_receive_skb+0x4e/0x7d
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02dacf6>]
> napi_skb_finish+0x1e/0x34
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02db122>]
> napi_gro_receive+0x20/0x24
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<f85d1743>]
> e1000_receive_skb+0x3f/0x45 [e1000e]
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<f85d3464>]
> e1000_clean_rx_irq+0x1f9/0x284 [e1000e]
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<f85d3926>]
> e1000_clean+0x62/0x1f4 [e1000e]
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02db228>]
> net_rx_action+0x90/0x160
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c012a445>]
> __do_softirq+0x7b/0x118
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] irq event stamp:
> 156915469
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] hardirqs last enabled
> at (156915469): [<c019b4f4>] __slab_alloc.clone.58.clone.63+0xc4/0x2de
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] hardirqs last disabled
> at (156915468): [<c019b452>] __slab_alloc.clone.58.clone.63+0x22/0x2de
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] softirqs last enabled
> at (156915466): [<c02ce677>] lock_sock_nested+0x64/0x6c
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] softirqs last disabled
> at (156915464): [<c0349914>] _raw_spin_lock_bh+0xe/0x45
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542]
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] other info that might
> help us debug this:
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] Possible unsafe
> locking scenario:
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542]
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] CPU0
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] ----
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] lock(key#3);
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] <Interrupt>
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] lock(key#3);
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542]
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] *** DEADLOCK ***
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542]
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] 1 lock held by
> squid/1603:
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] #0:
> (sk_lock-AF_INET){+.+.+.}, at: [<c03055c0>] lock_sock+0xa/0xc
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542]
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] stack backtrace:
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] Pid: 1603, comm: squid
> Not tainted 3.3.4-build-0061 #8
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] Call Trace:
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0347b73>] ?
> printk+0x18/0x1d
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c015873a>]
> valid_state+0x1f6/0x201
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0158816>]
> mark_lock+0xd1/0x1bb
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c015876b>] ?
> mark_lock+0x26/0x1bb
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c015805d>] ?
> check_usage_forwards+0x77/0x77
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0158bf8>]
> __lock_acquire+0x2f8/0xc26
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0159b8e>] ?
> mark_held_locks+0x5d/0x7b
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0159cf6>] ?
> trace_hardirqs_on+0xb/0xd
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0158dd4>] ?
> __lock_acquire+0x4d4/0xc26
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c01598e8>]
> lock_acquire+0x71/0x85
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0232cc4>] ?
> __percpu_counter_sum+0xd/0x58
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0349765>]
> _raw_spin_lock+0x33/0x40
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0232cc4>] ?
> __percpu_counter_sum+0xd/0x58
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0232cc4>]
> __percpu_counter_sum+0xd/0x58
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02cebc4>]
> __sk_mem_schedule+0xdd/0x1c7
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02d178d>] ?
> __alloc_skb+0x76/0x100
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0305e8e>]
> sk_wmem_schedule+0x21/0x2d
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0306370>]
> sk_stream_alloc_skb+0x42/0xaa
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0306567>]
> tcp_sendmsg+0x18f/0x68b
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c031f3dc>] ?
> ip_fast_csum+0x30/0x30
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0320193>]
> inet_sendmsg+0x53/0x5a
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02cb633>]
> sock_aio_write+0xd2/0xda
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c015876b>] ?
> mark_lock+0x26/0x1bb
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c01a1017>]
> do_sync_write+0x9f/0xd9
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c01a2111>] ?
> file_free_rcu+0x2f/0x2f
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c01a17a1>]
> vfs_write+0x8f/0xab
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c01a284d>] ?
> fget_light+0x75/0x7c
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c01a1900>]
> sys_write+0x3d/0x5e
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0349ec9>]
> syscall_call+0x7/0xb
> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0340000>] ?
> rp_sidt+0x41/0x83
>
>
> ---
> Denys Fedoryshchenko
Seems to be related to Glauber Costa work
sk_memory_allocated_add() seems to assume __percpu_counter_add() is irq
safe (truth is : it is not safe)
^ permalink raw reply
* inconsistent lock/deadlock crash, vanilla 3.3.4, 32bit, tcp
From: Denys Fedoryshchenko @ 2012-04-29 7:41 UTC (permalink / raw)
To: netdev, davem, jmorris, yoshfuji, kaber, linux-kernel
I apologize for late night post, and a lot of trash left in report.
I will cleanup it up now, and send with CC to maintainers.
Server job are proxy, with very high rate of new connections.
Deadlock at peaktime can be easily reproduced in 10-15 minutes.
Deadlock occured on almost all 3.3-stable versions (tried 3.3.3 -
3.3.4). It is not easy to try older kernel,
but if required i can try.
Usually also, because SYN rate very high, i can see:
[ 51.612987] TCP: Possible SYN flooding on port 8080. Sending
cookies. Check SNMP counters.
[ 762.903868]
[ 762.903880] =================================
[ 762.903890] [ INFO: inconsistent lock state ]
[ 762.903903] 3.3.4-build-0061 #8 Not tainted
[ 762.904133] ---------------------------------
[ 762.904344] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[ 762.904542] squid/1603 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 762.904542] (key#3){+.?...}, at: [<c0232cc4>]
__percpu_counter_sum+0xd/0x58
[ 762.904542] {IN-SOFTIRQ-W} state was registered at:
[ 762.904542] [<c0158b84>] __lock_acquire+0x284/0xc26
[ 762.904542] [<c01598e8>] lock_acquire+0x71/0x85
[ 762.904542] [<c0349765>] _raw_spin_lock+0x33/0x40
[ 762.904542] [<c0232c93>] __percpu_counter_add+0x58/0x7c
[ 762.904542] [<c02cfde1>] sk_clone_lock+0x1e5/0x200
[ 762.904542] [<c0303ee4>] inet_csk_clone_lock+0xe/0x78
[ 762.904542] [<c0315778>] tcp_create_openreq_child+0x1b/0x404
[ 762.904542] [<c031339c>] tcp_v4_syn_recv_sock+0x32/0x1c1
[ 762.904542] [<c031615a>] tcp_check_req+0x1fd/0x2d7
[ 762.904542] [<c0313f77>] tcp_v4_do_rcv+0xab/0x194
[ 762.904542] [<c03153bb>] tcp_v4_rcv+0x3b3/0x5cc
[ 762.904542] [<c02fc0c4>] ip_local_deliver_finish+0x13a/0x1e9
[ 762.904542] [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
[ 762.904542] [<c02fc652>] ip_local_deliver+0x41/0x45
[ 762.904542] [<c02fc4d1>] ip_rcv_finish+0x31a/0x33c
[ 762.904542] [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
[ 762.904542] [<c02fc857>] ip_rcv+0x201/0x23e
[ 762.904542] [<c02daa3a>] __netif_receive_skb+0x319/0x368
[ 762.904542] [<c02dac07>] netif_receive_skb+0x4e/0x7d
[ 762.904542] [<c02dacf6>] napi_skb_finish+0x1e/0x34
[ 762.904542] [<c02db122>] napi_gro_receive+0x20/0x24
[ 762.904542] [<f85d1743>] e1000_receive_skb+0x3f/0x45 [e1000e]
[ 762.904542] [<f85d3464>] e1000_clean_rx_irq+0x1f9/0x284 [e1000e]
[ 762.904542] [<f85d3926>] e1000_clean+0x62/0x1f4 [e1000e]
[ 762.904542] [<c02db228>] net_rx_action+0x90/0x160
[ 762.904542] [<c012a445>] __do_softirq+0x7b/0x118
[ 762.904542] irq event stamp: 156915469
[ 762.904542] hardirqs last enabled at (156915469): [<c019b4f4>]
__slab_alloc.clone.58.clone.63+0xc4/0x2de
[ 762.904542] hardirqs last disabled at (156915468): [<c019b452>]
__slab_alloc.clone.58.clone.63+0x22/0x2de
[ 762.904542] softirqs last enabled at (156915466): [<c02ce677>]
lock_sock_nested+0x64/0x6c
[ 762.904542] softirqs last disabled at (156915464): [<c0349914>]
_raw_spin_lock_bh+0xe/0x45
[ 762.904542]
[ 762.904542] other info that might help us debug this:
[ 762.904542] Possible unsafe locking scenario:
[ 762.904542]
[ 762.904542] CPU0
[ 762.904542] ----
[ 762.904542] lock(key#3);
[ 762.904542] <Interrupt>
[ 762.904542] lock(key#3);
[ 762.904542]
[ 762.904542] *** DEADLOCK ***
[ 762.904542]
[ 762.904542] 1 lock held by squid/1603:
[ 762.904542] #0: (sk_lock-AF_INET){+.+.+.}, at: [<c03055c0>]
lock_sock+0xa/0xc
[ 762.904542]
[ 762.904542] stack backtrace:
[ 762.904542] Pid: 1603, comm: squid Not tainted 3.3.4-build-0061 #8
[ 762.904542] Call Trace:
[ 762.904542] [<c0347b73>] ? printk+0x18/0x1d
[ 762.904542] [<c015873a>] valid_state+0x1f6/0x201
[ 762.904542] [<c0158816>] mark_lock+0xd1/0x1bb
[ 762.904542] [<c015876b>] ? mark_lock+0x26/0x1bb
[ 762.904542] [<c015805d>] ? check_usage_forwards+0x77/0x77
[ 762.904542] [<c0158bf8>] __lock_acquire+0x2f8/0xc26
[ 762.904542] [<c0159b8e>] ? mark_held_locks+0x5d/0x7b
[ 762.904542] [<c0159cf6>] ? trace_hardirqs_on+0xb/0xd
[ 762.904542] [<c0158dd4>] ? __lock_acquire+0x4d4/0xc26
[ 762.904542] [<c01598e8>] lock_acquire+0x71/0x85
[ 762.904542] [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
[ 762.904542] [<c0349765>] _raw_spin_lock+0x33/0x40
[ 762.904542] [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
[ 762.904542] [<c0232cc4>] __percpu_counter_sum+0xd/0x58
[ 762.904542] [<c02cebc4>] __sk_mem_schedule+0xdd/0x1c7
[ 762.904542] [<c02d178d>] ? __alloc_skb+0x76/0x100
[ 762.904542] [<c0305e8e>] sk_wmem_schedule+0x21/0x2d
[ 762.904542] [<c0306370>] sk_stream_alloc_skb+0x42/0xaa
[ 762.904542] [<c0306567>] tcp_sendmsg+0x18f/0x68b
[ 762.904542] [<c031f3dc>] ? ip_fast_csum+0x30/0x30
[ 762.904542] [<c0320193>] inet_sendmsg+0x53/0x5a
[ 762.904542] [<c02cb633>] sock_aio_write+0xd2/0xda
[ 762.904542] [<c015876b>] ? mark_lock+0x26/0x1bb
[ 762.904542] [<c01a1017>] do_sync_write+0x9f/0xd9
[ 762.904542] [<c01a2111>] ? file_free_rcu+0x2f/0x2f
[ 762.904542] [<c01a17a1>] vfs_write+0x8f/0xab
[ 762.904542] [<c01a284d>] ? fget_light+0x75/0x7c
[ 762.904542] [<c01a1900>] sys_write+0x3d/0x5e
[ 762.904542] [<c0349ec9>] syscall_call+0x7/0xb
[ 762.904542] [<c0340000>] ? rp_sidt+0x41/0x83
---
Denys Fedoryshchenko
---
Network engineer
Denys Fedoryshchenko
Dora Highway - Center Cebaco - 2nd Floor
Beirut, Lebanon
Tel: +961 1 247373
E-Mail: denys@visp.net.lb
^ permalink raw reply
* Re: kernel panic, inconsistent lock, vanilla 3.3.4, 32bit, tcp
From: Denys Fedoryshchenko @ 2012-04-29 7:45 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, Glauber Costa
In-Reply-To: <1335684762.2900.69.camel@edumazet-glaptop>
On 2012-04-29 10:32, Eric Dumazet wrote:
> On Sun, 2012-04-29 at 00:45 +0300, Denys Fedoryshchenko wrote:
>> Hi
>>
>> Just got repeatable kernel panic on loaded proxy server
>>
>> Here is the log:
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.903868]
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.903880]
>> =================================
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.903890] [ INFO:
>> inconsistent
>> lock state ]
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.903903] 3.3.4-build-0061 #8
>> Not tainted
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904133]
>> ---------------------------------
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904344] inconsistent
>> {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] squid/1603
>> [HC0[0]:SC0[0]:HE1:SE1] takes:
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] (key#3){+.?...},
>> at:
>> [<c0232cc4>] __percpu_counter_sum+0xd/0x58
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] {IN-SOFTIRQ-W}
>> state
>> was registered at:
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0158b84>]
>> __lock_acquire+0x284/0xc26
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c01598e8>]
>> lock_acquire+0x71/0x85
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0349765>]
>> _raw_spin_lock+0x33/0x40
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0232c93>]
>> __percpu_counter_add+0x58/0x7c
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02cfde1>]
>> sk_clone_lock+0x1e5/0x200
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0303ee4>]
>> inet_csk_clone_lock+0xe/0x78
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0315778>]
>> tcp_create_openreq_child+0x1b/0x404
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c031339c>]
>> tcp_v4_syn_recv_sock+0x32/0x1c1
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c031615a>]
>> tcp_check_req+0x1fd/0x2d7
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0313f77>]
>> tcp_v4_do_rcv+0xab/0x194
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c03153bb>]
>> tcp_v4_rcv+0x3b3/0x5cc
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02fc0c4>]
>> ip_local_deliver_finish+0x13a/0x1e9
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02fc539>]
>> NF_HOOK.clone.11+0x46/0x4d
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02fc652>]
>> ip_local_deliver+0x41/0x45
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02fc4d1>]
>> ip_rcv_finish+0x31a/0x33c
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02fc539>]
>> NF_HOOK.clone.11+0x46/0x4d
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02fc857>]
>> ip_rcv+0x201/0x23e
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02daa3a>]
>> __netif_receive_skb+0x319/0x368
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02dac07>]
>> netif_receive_skb+0x4e/0x7d
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02dacf6>]
>> napi_skb_finish+0x1e/0x34
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02db122>]
>> napi_gro_receive+0x20/0x24
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<f85d1743>]
>> e1000_receive_skb+0x3f/0x45 [e1000e]
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<f85d3464>]
>> e1000_clean_rx_irq+0x1f9/0x284 [e1000e]
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<f85d3926>]
>> e1000_clean+0x62/0x1f4 [e1000e]
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02db228>]
>> net_rx_action+0x90/0x160
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c012a445>]
>> __do_softirq+0x7b/0x118
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] irq event stamp:
>> 156915469
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] hardirqs last
>> enabled
>> at (156915469): [<c019b4f4>]
>> __slab_alloc.clone.58.clone.63+0xc4/0x2de
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] hardirqs last
>> disabled
>> at (156915468): [<c019b452>]
>> __slab_alloc.clone.58.clone.63+0x22/0x2de
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] softirqs last
>> enabled
>> at (156915466): [<c02ce677>] lock_sock_nested+0x64/0x6c
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] softirqs last
>> disabled
>> at (156915464): [<c0349914>] _raw_spin_lock_bh+0xe/0x45
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542]
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] other info that
>> might
>> help us debug this:
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] Possible unsafe
>> locking scenario:
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542]
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] CPU0
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] ----
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] lock(key#3);
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] <Interrupt>
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] lock(key#3);
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542]
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] *** DEADLOCK ***
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542]
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] 1 lock held by
>> squid/1603:
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] #0:
>> (sk_lock-AF_INET){+.+.+.}, at: [<c03055c0>] lock_sock+0xa/0xc
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542]
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] stack backtrace:
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] Pid: 1603, comm:
>> squid
>> Not tainted 3.3.4-build-0061 #8
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] Call Trace:
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0347b73>] ?
>> printk+0x18/0x1d
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c015873a>]
>> valid_state+0x1f6/0x201
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0158816>]
>> mark_lock+0xd1/0x1bb
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c015876b>] ?
>> mark_lock+0x26/0x1bb
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c015805d>] ?
>> check_usage_forwards+0x77/0x77
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0158bf8>]
>> __lock_acquire+0x2f8/0xc26
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0159b8e>] ?
>> mark_held_locks+0x5d/0x7b
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0159cf6>] ?
>> trace_hardirqs_on+0xb/0xd
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0158dd4>] ?
>> __lock_acquire+0x4d4/0xc26
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c01598e8>]
>> lock_acquire+0x71/0x85
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0232cc4>] ?
>> __percpu_counter_sum+0xd/0x58
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0349765>]
>> _raw_spin_lock+0x33/0x40
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0232cc4>] ?
>> __percpu_counter_sum+0xd/0x58
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0232cc4>]
>> __percpu_counter_sum+0xd/0x58
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02cebc4>]
>> __sk_mem_schedule+0xdd/0x1c7
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02d178d>] ?
>> __alloc_skb+0x76/0x100
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0305e8e>]
>> sk_wmem_schedule+0x21/0x2d
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0306370>]
>> sk_stream_alloc_skb+0x42/0xaa
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0306567>]
>> tcp_sendmsg+0x18f/0x68b
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c031f3dc>] ?
>> ip_fast_csum+0x30/0x30
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0320193>]
>> inet_sendmsg+0x53/0x5a
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c02cb633>]
>> sock_aio_write+0xd2/0xda
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c015876b>] ?
>> mark_lock+0x26/0x1bb
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c01a1017>]
>> do_sync_write+0x9f/0xd9
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c01a2111>] ?
>> file_free_rcu+0x2f/0x2f
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c01a17a1>]
>> vfs_write+0x8f/0xab
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c01a284d>] ?
>> fget_light+0x75/0x7c
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c01a1900>]
>> sys_write+0x3d/0x5e
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0349ec9>]
>> syscall_call+0x7/0xb
>> Apr 29 00:38:17 10.0.15.2 kernel: [ 762.904542] [<c0340000>] ?
>> rp_sidt+0x41/0x83
>>
>>
>> ---
>> Denys Fedoryshchenko
>
> Seems to be related to Glauber Costa work
>
> sk_memory_allocated_add() seems to assume __percpu_counter_add() is
> irq
> safe (truth is : it is not safe)
Thanks Eric, and sorry for mangled post (long lines). I was very
sleepy, and resent it properly now.
I will try to check his patches.
^ permalink raw reply
* Re: kernel panic, inconsistent lock, vanilla 3.3.4, 32bit, tcp
From: Denys Fedoryshchenko @ 2012-04-29 8:00 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, Glauber Costa
In-Reply-To: <1335684762.2900.69.camel@edumazet-glaptop>
On 2012-04-29 10:32, Eric Dumazet wrote:
> On Sun, 2012-04-29 at 00:45 +0300, Denys Fedoryshchenko wrote:
>> Hi
>>
>> Just got repeatable kernel panic on loaded proxy server
>>
>> Here is the log:
>>
>> ---
>> Denys Fedoryshchenko
>
> Seems to be related to Glauber Costa work
>
> sk_memory_allocated_add() seems to assume __percpu_counter_add() is
> irq
> safe (truth is : it is not safe)
Not sure if it is related, but at my case, CONFIG_CGROUPS is not set
^ permalink raw reply
* Re: kernel panic, inconsistent lock, vanilla 3.3.4, 32bit, tcp
From: Eric Dumazet @ 2012-04-29 8:17 UTC (permalink / raw)
To: Denys Fedoryshchenko; +Cc: netdev, Glauber Costa
In-Reply-To: <6e1a75b751372f0ca5a58c2d0122a3cc@visp.net.lb>
On Sun, 2012-04-29 at 11:00 +0300, Denys Fedoryshchenko wrote:
> On 2012-04-29 10:32, Eric Dumazet wrote:
> > On Sun, 2012-04-29 at 00:45 +0300, Denys Fedoryshchenko wrote:
> >> Hi
> >>
> >> Just got repeatable kernel panic on loaded proxy server
> >>
> >> Here is the log:
>
> >>
> >> ---
> >> Denys Fedoryshchenko
> >
> > Seems to be related to Glauber Costa work
> >
> > sk_memory_allocated_add() seems to assume __percpu_counter_add() is
> > irq
> > safe (truth is : it is not safe)
>
> Not sure if it is related, but at my case, CONFIG_CGROUPS is not set
>
Hmm, so its an old bug...
I'll take a look, thanks a lot !
(I am going to reply on your new thread since its better formatted)
^ permalink raw reply
* Re: inconsistent lock/deadlock crash, vanilla 3.3.4, 32bit, tcp
From: Eric Dumazet @ 2012-04-29 8:27 UTC (permalink / raw)
To: Denys Fedoryshchenko
Cc: netdev, davem, jmorris, yoshfuji, kaber, linux-kernel,
Glauber Costa
In-Reply-To: <13fc59c8e3be34ddb3c7bb5c17b1bdf8@visp.net.lb>
On Sun, 2012-04-29 at 10:41 +0300, Denys Fedoryshchenko wrote:
> I apologize for late night post, and a lot of trash left in report.
> I will cleanup it up now, and send with CC to maintainers.
>
> Server job are proxy, with very high rate of new connections.
> Deadlock at peaktime can be easily reproduced in 10-15 minutes.
>
> Deadlock occured on almost all 3.3-stable versions (tried 3.3.3 -
> 3.3.4). It is not easy to try older kernel,
> but if required i can try.
> Usually also, because SYN rate very high, i can see:
> [ 51.612987] TCP: Possible SYN flooding on port 8080. Sending
> cookies. Check SNMP counters.
>
> [ 762.903868]
> [ 762.903880] =================================
> [ 762.903890] [ INFO: inconsistent lock state ]
> [ 762.903903] 3.3.4-build-0061 #8 Not tainted
> [ 762.904133] ---------------------------------
> [ 762.904344] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
> [ 762.904542] squid/1603 [HC0[0]:SC0[0]:HE1:SE1] takes:
> [ 762.904542] (key#3){+.?...}, at: [<c0232cc4>]
> __percpu_counter_sum+0xd/0x58
> [ 762.904542] {IN-SOFTIRQ-W} state was registered at:
> [ 762.904542] [<c0158b84>] __lock_acquire+0x284/0xc26
> [ 762.904542] [<c01598e8>] lock_acquire+0x71/0x85
> [ 762.904542] [<c0349765>] _raw_spin_lock+0x33/0x40
> [ 762.904542] [<c0232c93>] __percpu_counter_add+0x58/0x7c
> [ 762.904542] [<c02cfde1>] sk_clone_lock+0x1e5/0x200
> [ 762.904542] [<c0303ee4>] inet_csk_clone_lock+0xe/0x78
> [ 762.904542] [<c0315778>] tcp_create_openreq_child+0x1b/0x404
> [ 762.904542] [<c031339c>] tcp_v4_syn_recv_sock+0x32/0x1c1
> [ 762.904542] [<c031615a>] tcp_check_req+0x1fd/0x2d7
> [ 762.904542] [<c0313f77>] tcp_v4_do_rcv+0xab/0x194
> [ 762.904542] [<c03153bb>] tcp_v4_rcv+0x3b3/0x5cc
> [ 762.904542] [<c02fc0c4>] ip_local_deliver_finish+0x13a/0x1e9
> [ 762.904542] [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
> [ 762.904542] [<c02fc652>] ip_local_deliver+0x41/0x45
> [ 762.904542] [<c02fc4d1>] ip_rcv_finish+0x31a/0x33c
> [ 762.904542] [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
> [ 762.904542] [<c02fc857>] ip_rcv+0x201/0x23e
> [ 762.904542] [<c02daa3a>] __netif_receive_skb+0x319/0x368
> [ 762.904542] [<c02dac07>] netif_receive_skb+0x4e/0x7d
> [ 762.904542] [<c02dacf6>] napi_skb_finish+0x1e/0x34
> [ 762.904542] [<c02db122>] napi_gro_receive+0x20/0x24
> [ 762.904542] [<f85d1743>] e1000_receive_skb+0x3f/0x45 [e1000e]
> [ 762.904542] [<f85d3464>] e1000_clean_rx_irq+0x1f9/0x284 [e1000e]
> [ 762.904542] [<f85d3926>] e1000_clean+0x62/0x1f4 [e1000e]
> [ 762.904542] [<c02db228>] net_rx_action+0x90/0x160
> [ 762.904542] [<c012a445>] __do_softirq+0x7b/0x118
> [ 762.904542] irq event stamp: 156915469
> [ 762.904542] hardirqs last enabled at (156915469): [<c019b4f4>]
> __slab_alloc.clone.58.clone.63+0xc4/0x2de
> [ 762.904542] hardirqs last disabled at (156915468): [<c019b452>]
> __slab_alloc.clone.58.clone.63+0x22/0x2de
> [ 762.904542] softirqs last enabled at (156915466): [<c02ce677>]
> lock_sock_nested+0x64/0x6c
> [ 762.904542] softirqs last disabled at (156915464): [<c0349914>]
> _raw_spin_lock_bh+0xe/0x45
> [ 762.904542]
> [ 762.904542] other info that might help us debug this:
> [ 762.904542] Possible unsafe locking scenario:
> [ 762.904542]
> [ 762.904542] CPU0
> [ 762.904542] ----
> [ 762.904542] lock(key#3);
> [ 762.904542] <Interrupt>
> [ 762.904542] lock(key#3);
> [ 762.904542]
> [ 762.904542] *** DEADLOCK ***
> [ 762.904542]
> [ 762.904542] 1 lock held by squid/1603:
> [ 762.904542] #0: (sk_lock-AF_INET){+.+.+.}, at: [<c03055c0>]
> lock_sock+0xa/0xc
> [ 762.904542]
> [ 762.904542] stack backtrace:
> [ 762.904542] Pid: 1603, comm: squid Not tainted 3.3.4-build-0061 #8
> [ 762.904542] Call Trace:
> [ 762.904542] [<c0347b73>] ? printk+0x18/0x1d
> [ 762.904542] [<c015873a>] valid_state+0x1f6/0x201
> [ 762.904542] [<c0158816>] mark_lock+0xd1/0x1bb
> [ 762.904542] [<c015876b>] ? mark_lock+0x26/0x1bb
> [ 762.904542] [<c015805d>] ? check_usage_forwards+0x77/0x77
> [ 762.904542] [<c0158bf8>] __lock_acquire+0x2f8/0xc26
> [ 762.904542] [<c0159b8e>] ? mark_held_locks+0x5d/0x7b
> [ 762.904542] [<c0159cf6>] ? trace_hardirqs_on+0xb/0xd
> [ 762.904542] [<c0158dd4>] ? __lock_acquire+0x4d4/0xc26
> [ 762.904542] [<c01598e8>] lock_acquire+0x71/0x85
> [ 762.904542] [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
> [ 762.904542] [<c0349765>] _raw_spin_lock+0x33/0x40
> [ 762.904542] [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
> [ 762.904542] [<c0232cc4>] __percpu_counter_sum+0xd/0x58
> [ 762.904542] [<c02cebc4>] __sk_mem_schedule+0xdd/0x1c7
> [ 762.904542] [<c02d178d>] ? __alloc_skb+0x76/0x100
> [ 762.904542] [<c0305e8e>] sk_wmem_schedule+0x21/0x2d
> [ 762.904542] [<c0306370>] sk_stream_alloc_skb+0x42/0xaa
> [ 762.904542] [<c0306567>] tcp_sendmsg+0x18f/0x68b
> [ 762.904542] [<c031f3dc>] ? ip_fast_csum+0x30/0x30
> [ 762.904542] [<c0320193>] inet_sendmsg+0x53/0x5a
> [ 762.904542] [<c02cb633>] sock_aio_write+0xd2/0xda
> [ 762.904542] [<c015876b>] ? mark_lock+0x26/0x1bb
> [ 762.904542] [<c01a1017>] do_sync_write+0x9f/0xd9
> [ 762.904542] [<c01a2111>] ? file_free_rcu+0x2f/0x2f
> [ 762.904542] [<c01a17a1>] vfs_write+0x8f/0xab
> [ 762.904542] [<c01a284d>] ? fget_light+0x75/0x7c
> [ 762.904542] [<c01a1900>] sys_write+0x3d/0x5e
> [ 762.904542] [<c0349ec9>] syscall_call+0x7/0xb
> [ 762.904542] [<c0340000>] ? rp_sidt+0x41/0x83
>
OK, so when we have memory pressure we can call
percpu_counter_read_positive() with SOFTIRQ enabled, and lockdep
complains...
This bug was probably added in 2008, in commit 1748376b6626a
(net: Use a percpu_counter for sockets_allocated)
I'll have to backport the following patch, can you test it please ?
Thanks !
net/core/sock.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/core/sock.c b/net/core/sock.c
index b2e14c0..08fc929 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1818,7 +1818,9 @@ int __sk_mem_schedule(struct sock *sk, int size, int kind)
if (!sk_under_memory_pressure(sk))
return 1;
+ local_bh_disable();
alloc = sk_sockets_allocated_read_positive(sk);
+ local_bh_enable();
if (sk_prot_mem_limits(sk, 2) > alloc *
sk_mem_pages(sk->sk_wmem_queued +
atomic_read(&sk->sk_rmem_alloc) +
^ permalink raw reply related
* Re: inconsistent lock/deadlock crash, vanilla 3.3.4, 32bit, tcp
From: Eric Dumazet @ 2012-04-29 8:40 UTC (permalink / raw)
To: Denys Fedoryshchenko
Cc: netdev, davem, jmorris, yoshfuji, kaber, linux-kernel,
Glauber Costa
In-Reply-To: <1335688036.2900.80.camel@edumazet-glaptop>
On Sun, 2012-04-29 at 10:27 +0200, Eric Dumazet wrote:
> On Sun, 2012-04-29 at 10:41 +0300, Denys Fedoryshchenko wrote:
> > I apologize for late night post, and a lot of trash left in report.
> > I will cleanup it up now, and send with CC to maintainers.
> >
> > Server job are proxy, with very high rate of new connections.
> > Deadlock at peaktime can be easily reproduced in 10-15 minutes.
> >
> > Deadlock occured on almost all 3.3-stable versions (tried 3.3.3 -
> > 3.3.4). It is not easy to try older kernel,
> > but if required i can try.
> > Usually also, because SYN rate very high, i can see:
> > [ 51.612987] TCP: Possible SYN flooding on port 8080. Sending
> > cookies. Check SNMP counters.
> >
> > [ 762.903868]
> > [ 762.903880] =================================
> > [ 762.903890] [ INFO: inconsistent lock state ]
> > [ 762.903903] 3.3.4-build-0061 #8 Not tainted
> > [ 762.904133] ---------------------------------
> > [ 762.904344] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
> > [ 762.904542] squid/1603 [HC0[0]:SC0[0]:HE1:SE1] takes:
> > [ 762.904542] (key#3){+.?...}, at: [<c0232cc4>]
> > __percpu_counter_sum+0xd/0x58
> > [ 762.904542] {IN-SOFTIRQ-W} state was registered at:
> > [ 762.904542] [<c0158b84>] __lock_acquire+0x284/0xc26
> > [ 762.904542] [<c01598e8>] lock_acquire+0x71/0x85
> > [ 762.904542] [<c0349765>] _raw_spin_lock+0x33/0x40
> > [ 762.904542] [<c0232c93>] __percpu_counter_add+0x58/0x7c
> > [ 762.904542] [<c02cfde1>] sk_clone_lock+0x1e5/0x200
> > [ 762.904542] [<c0303ee4>] inet_csk_clone_lock+0xe/0x78
> > [ 762.904542] [<c0315778>] tcp_create_openreq_child+0x1b/0x404
> > [ 762.904542] [<c031339c>] tcp_v4_syn_recv_sock+0x32/0x1c1
> > [ 762.904542] [<c031615a>] tcp_check_req+0x1fd/0x2d7
> > [ 762.904542] [<c0313f77>] tcp_v4_do_rcv+0xab/0x194
> > [ 762.904542] [<c03153bb>] tcp_v4_rcv+0x3b3/0x5cc
> > [ 762.904542] [<c02fc0c4>] ip_local_deliver_finish+0x13a/0x1e9
> > [ 762.904542] [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
> > [ 762.904542] [<c02fc652>] ip_local_deliver+0x41/0x45
> > [ 762.904542] [<c02fc4d1>] ip_rcv_finish+0x31a/0x33c
> > [ 762.904542] [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
> > [ 762.904542] [<c02fc857>] ip_rcv+0x201/0x23e
> > [ 762.904542] [<c02daa3a>] __netif_receive_skb+0x319/0x368
> > [ 762.904542] [<c02dac07>] netif_receive_skb+0x4e/0x7d
> > [ 762.904542] [<c02dacf6>] napi_skb_finish+0x1e/0x34
> > [ 762.904542] [<c02db122>] napi_gro_receive+0x20/0x24
> > [ 762.904542] [<f85d1743>] e1000_receive_skb+0x3f/0x45 [e1000e]
> > [ 762.904542] [<f85d3464>] e1000_clean_rx_irq+0x1f9/0x284 [e1000e]
> > [ 762.904542] [<f85d3926>] e1000_clean+0x62/0x1f4 [e1000e]
> > [ 762.904542] [<c02db228>] net_rx_action+0x90/0x160
> > [ 762.904542] [<c012a445>] __do_softirq+0x7b/0x118
> > [ 762.904542] irq event stamp: 156915469
> > [ 762.904542] hardirqs last enabled at (156915469): [<c019b4f4>]
> > __slab_alloc.clone.58.clone.63+0xc4/0x2de
> > [ 762.904542] hardirqs last disabled at (156915468): [<c019b452>]
> > __slab_alloc.clone.58.clone.63+0x22/0x2de
> > [ 762.904542] softirqs last enabled at (156915466): [<c02ce677>]
> > lock_sock_nested+0x64/0x6c
> > [ 762.904542] softirqs last disabled at (156915464): [<c0349914>]
> > _raw_spin_lock_bh+0xe/0x45
> > [ 762.904542]
> > [ 762.904542] other info that might help us debug this:
> > [ 762.904542] Possible unsafe locking scenario:
> > [ 762.904542]
> > [ 762.904542] CPU0
> > [ 762.904542] ----
> > [ 762.904542] lock(key#3);
> > [ 762.904542] <Interrupt>
> > [ 762.904542] lock(key#3);
> > [ 762.904542]
> > [ 762.904542] *** DEADLOCK ***
> > [ 762.904542]
> > [ 762.904542] 1 lock held by squid/1603:
> > [ 762.904542] #0: (sk_lock-AF_INET){+.+.+.}, at: [<c03055c0>]
> > lock_sock+0xa/0xc
> > [ 762.904542]
> > [ 762.904542] stack backtrace:
> > [ 762.904542] Pid: 1603, comm: squid Not tainted 3.3.4-build-0061 #8
> > [ 762.904542] Call Trace:
> > [ 762.904542] [<c0347b73>] ? printk+0x18/0x1d
> > [ 762.904542] [<c015873a>] valid_state+0x1f6/0x201
> > [ 762.904542] [<c0158816>] mark_lock+0xd1/0x1bb
> > [ 762.904542] [<c015876b>] ? mark_lock+0x26/0x1bb
> > [ 762.904542] [<c015805d>] ? check_usage_forwards+0x77/0x77
> > [ 762.904542] [<c0158bf8>] __lock_acquire+0x2f8/0xc26
> > [ 762.904542] [<c0159b8e>] ? mark_held_locks+0x5d/0x7b
> > [ 762.904542] [<c0159cf6>] ? trace_hardirqs_on+0xb/0xd
> > [ 762.904542] [<c0158dd4>] ? __lock_acquire+0x4d4/0xc26
> > [ 762.904542] [<c01598e8>] lock_acquire+0x71/0x85
> > [ 762.904542] [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
> > [ 762.904542] [<c0349765>] _raw_spin_lock+0x33/0x40
> > [ 762.904542] [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
> > [ 762.904542] [<c0232cc4>] __percpu_counter_sum+0xd/0x58
> > [ 762.904542] [<c02cebc4>] __sk_mem_schedule+0xdd/0x1c7
> > [ 762.904542] [<c02d178d>] ? __alloc_skb+0x76/0x100
> > [ 762.904542] [<c0305e8e>] sk_wmem_schedule+0x21/0x2d
> > [ 762.904542] [<c0306370>] sk_stream_alloc_skb+0x42/0xaa
> > [ 762.904542] [<c0306567>] tcp_sendmsg+0x18f/0x68b
> > [ 762.904542] [<c031f3dc>] ? ip_fast_csum+0x30/0x30
> > [ 762.904542] [<c0320193>] inet_sendmsg+0x53/0x5a
> > [ 762.904542] [<c02cb633>] sock_aio_write+0xd2/0xda
> > [ 762.904542] [<c015876b>] ? mark_lock+0x26/0x1bb
> > [ 762.904542] [<c01a1017>] do_sync_write+0x9f/0xd9
> > [ 762.904542] [<c01a2111>] ? file_free_rcu+0x2f/0x2f
> > [ 762.904542] [<c01a17a1>] vfs_write+0x8f/0xab
> > [ 762.904542] [<c01a284d>] ? fget_light+0x75/0x7c
> > [ 762.904542] [<c01a1900>] sys_write+0x3d/0x5e
> > [ 762.904542] [<c0349ec9>] syscall_call+0x7/0xb
> > [ 762.904542] [<c0340000>] ? rp_sidt+0x41/0x83
> >
>
>
> OK, so when we have memory pressure we can call
> percpu_counter_read_positive() with SOFTIRQ enabled, and lockdep
> complains...
>
> This bug was probably added in 2008, in commit 1748376b6626a
> (net: Use a percpu_counter for sockets_allocated)
Hmm, no this patch was fine.
Bug was in fact added by Glauber Costa in commit 180d8cd942ce336b2c869
(foundations of per-cgroup memory pressure controlling.)
Because he replaced the safe percpu_counter_read_positive() call to
unsafe percpu_counter_sum_positive() in
sk_sockets_allocated_read_positive()
But anyway the patch I sent should fix the problem.
>
> I'll have to backport the following patch, can you test it please ?
>
> Thanks !
>
> net/core/sock.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/net/core/sock.c b/net/core/sock.c
> index b2e14c0..08fc929 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1818,7 +1818,9 @@ int __sk_mem_schedule(struct sock *sk, int size, int kind)
>
> if (!sk_under_memory_pressure(sk))
> return 1;
> + local_bh_disable();
> alloc = sk_sockets_allocated_read_positive(sk);
> + local_bh_enable();
> if (sk_prot_mem_limits(sk, 2) > alloc *
> sk_mem_pages(sk->sk_wmem_queued +
> atomic_read(&sk->sk_rmem_alloc) +
>
>
>
>
>
>
>
^ permalink raw reply
* Re: inconsistent lock/deadlock crash, vanilla 3.3.4, 32bit, tcp
From: Eric Dumazet @ 2012-04-29 8:44 UTC (permalink / raw)
To: Denys Fedoryshchenko
Cc: netdev, davem, jmorris, yoshfuji, kaber, linux-kernel,
Glauber Costa
In-Reply-To: <1335688804.2900.83.camel@edumazet-glaptop>
On Sun, 2012-04-29 at 10:40 +0200, Eric Dumazet wrote:
> On Sun, 2012-04-29 at 10:27 +0200, Eric Dumazet wrote:
> > On Sun, 2012-04-29 at 10:41 +0300, Denys Fedoryshchenko wrote:
> > > I apologize for late night post, and a lot of trash left in report.
> > > I will cleanup it up now, and send with CC to maintainers.
> > >
> > > Server job are proxy, with very high rate of new connections.
> > > Deadlock at peaktime can be easily reproduced in 10-15 minutes.
> > >
> > > Deadlock occured on almost all 3.3-stable versions (tried 3.3.3 -
> > > 3.3.4). It is not easy to try older kernel,
> > > but if required i can try.
> > > Usually also, because SYN rate very high, i can see:
> > > [ 51.612987] TCP: Possible SYN flooding on port 8080. Sending
> > > cookies. Check SNMP counters.
> > >
> > > [ 762.903868]
> > > [ 762.903880] =================================
> > > [ 762.903890] [ INFO: inconsistent lock state ]
> > > [ 762.903903] 3.3.4-build-0061 #8 Not tainted
> > > [ 762.904133] ---------------------------------
> > > [ 762.904344] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
> > > [ 762.904542] squid/1603 [HC0[0]:SC0[0]:HE1:SE1] takes:
> > > [ 762.904542] (key#3){+.?...}, at: [<c0232cc4>]
> > > __percpu_counter_sum+0xd/0x58
> > > [ 762.904542] {IN-SOFTIRQ-W} state was registered at:
> > > [ 762.904542] [<c0158b84>] __lock_acquire+0x284/0xc26
> > > [ 762.904542] [<c01598e8>] lock_acquire+0x71/0x85
> > > [ 762.904542] [<c0349765>] _raw_spin_lock+0x33/0x40
> > > [ 762.904542] [<c0232c93>] __percpu_counter_add+0x58/0x7c
> > > [ 762.904542] [<c02cfde1>] sk_clone_lock+0x1e5/0x200
> > > [ 762.904542] [<c0303ee4>] inet_csk_clone_lock+0xe/0x78
> > > [ 762.904542] [<c0315778>] tcp_create_openreq_child+0x1b/0x404
> > > [ 762.904542] [<c031339c>] tcp_v4_syn_recv_sock+0x32/0x1c1
> > > [ 762.904542] [<c031615a>] tcp_check_req+0x1fd/0x2d7
> > > [ 762.904542] [<c0313f77>] tcp_v4_do_rcv+0xab/0x194
> > > [ 762.904542] [<c03153bb>] tcp_v4_rcv+0x3b3/0x5cc
> > > [ 762.904542] [<c02fc0c4>] ip_local_deliver_finish+0x13a/0x1e9
> > > [ 762.904542] [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
> > > [ 762.904542] [<c02fc652>] ip_local_deliver+0x41/0x45
> > > [ 762.904542] [<c02fc4d1>] ip_rcv_finish+0x31a/0x33c
> > > [ 762.904542] [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
> > > [ 762.904542] [<c02fc857>] ip_rcv+0x201/0x23e
> > > [ 762.904542] [<c02daa3a>] __netif_receive_skb+0x319/0x368
> > > [ 762.904542] [<c02dac07>] netif_receive_skb+0x4e/0x7d
> > > [ 762.904542] [<c02dacf6>] napi_skb_finish+0x1e/0x34
> > > [ 762.904542] [<c02db122>] napi_gro_receive+0x20/0x24
> > > [ 762.904542] [<f85d1743>] e1000_receive_skb+0x3f/0x45 [e1000e]
> > > [ 762.904542] [<f85d3464>] e1000_clean_rx_irq+0x1f9/0x284 [e1000e]
> > > [ 762.904542] [<f85d3926>] e1000_clean+0x62/0x1f4 [e1000e]
> > > [ 762.904542] [<c02db228>] net_rx_action+0x90/0x160
> > > [ 762.904542] [<c012a445>] __do_softirq+0x7b/0x118
> > > [ 762.904542] irq event stamp: 156915469
> > > [ 762.904542] hardirqs last enabled at (156915469): [<c019b4f4>]
> > > __slab_alloc.clone.58.clone.63+0xc4/0x2de
> > > [ 762.904542] hardirqs last disabled at (156915468): [<c019b452>]
> > > __slab_alloc.clone.58.clone.63+0x22/0x2de
> > > [ 762.904542] softirqs last enabled at (156915466): [<c02ce677>]
> > > lock_sock_nested+0x64/0x6c
> > > [ 762.904542] softirqs last disabled at (156915464): [<c0349914>]
> > > _raw_spin_lock_bh+0xe/0x45
> > > [ 762.904542]
> > > [ 762.904542] other info that might help us debug this:
> > > [ 762.904542] Possible unsafe locking scenario:
> > > [ 762.904542]
> > > [ 762.904542] CPU0
> > > [ 762.904542] ----
> > > [ 762.904542] lock(key#3);
> > > [ 762.904542] <Interrupt>
> > > [ 762.904542] lock(key#3);
> > > [ 762.904542]
> > > [ 762.904542] *** DEADLOCK ***
> > > [ 762.904542]
> > > [ 762.904542] 1 lock held by squid/1603:
> > > [ 762.904542] #0: (sk_lock-AF_INET){+.+.+.}, at: [<c03055c0>]
> > > lock_sock+0xa/0xc
> > > [ 762.904542]
> > > [ 762.904542] stack backtrace:
> > > [ 762.904542] Pid: 1603, comm: squid Not tainted 3.3.4-build-0061 #8
> > > [ 762.904542] Call Trace:
> > > [ 762.904542] [<c0347b73>] ? printk+0x18/0x1d
> > > [ 762.904542] [<c015873a>] valid_state+0x1f6/0x201
> > > [ 762.904542] [<c0158816>] mark_lock+0xd1/0x1bb
> > > [ 762.904542] [<c015876b>] ? mark_lock+0x26/0x1bb
> > > [ 762.904542] [<c015805d>] ? check_usage_forwards+0x77/0x77
> > > [ 762.904542] [<c0158bf8>] __lock_acquire+0x2f8/0xc26
> > > [ 762.904542] [<c0159b8e>] ? mark_held_locks+0x5d/0x7b
> > > [ 762.904542] [<c0159cf6>] ? trace_hardirqs_on+0xb/0xd
> > > [ 762.904542] [<c0158dd4>] ? __lock_acquire+0x4d4/0xc26
> > > [ 762.904542] [<c01598e8>] lock_acquire+0x71/0x85
> > > [ 762.904542] [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
> > > [ 762.904542] [<c0349765>] _raw_spin_lock+0x33/0x40
> > > [ 762.904542] [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
> > > [ 762.904542] [<c0232cc4>] __percpu_counter_sum+0xd/0x58
> > > [ 762.904542] [<c02cebc4>] __sk_mem_schedule+0xdd/0x1c7
> > > [ 762.904542] [<c02d178d>] ? __alloc_skb+0x76/0x100
> > > [ 762.904542] [<c0305e8e>] sk_wmem_schedule+0x21/0x2d
> > > [ 762.904542] [<c0306370>] sk_stream_alloc_skb+0x42/0xaa
> > > [ 762.904542] [<c0306567>] tcp_sendmsg+0x18f/0x68b
> > > [ 762.904542] [<c031f3dc>] ? ip_fast_csum+0x30/0x30
> > > [ 762.904542] [<c0320193>] inet_sendmsg+0x53/0x5a
> > > [ 762.904542] [<c02cb633>] sock_aio_write+0xd2/0xda
> > > [ 762.904542] [<c015876b>] ? mark_lock+0x26/0x1bb
> > > [ 762.904542] [<c01a1017>] do_sync_write+0x9f/0xd9
> > > [ 762.904542] [<c01a2111>] ? file_free_rcu+0x2f/0x2f
> > > [ 762.904542] [<c01a17a1>] vfs_write+0x8f/0xab
> > > [ 762.904542] [<c01a284d>] ? fget_light+0x75/0x7c
> > > [ 762.904542] [<c01a1900>] sys_write+0x3d/0x5e
> > > [ 762.904542] [<c0349ec9>] syscall_call+0x7/0xb
> > > [ 762.904542] [<c0340000>] ? rp_sidt+0x41/0x83
> > >
> >
> >
> > OK, so when we have memory pressure we can call
> > percpu_counter_read_positive() with SOFTIRQ enabled, and lockdep
> > complains...
> >
> > This bug was probably added in 2008, in commit 1748376b6626a
> > (net: Use a percpu_counter for sockets_allocated)
>
> Hmm, no this patch was fine.
>
> Bug was in fact added by Glauber Costa in commit 180d8cd942ce336b2c869
> (foundations of per-cgroup memory pressure controlling.)
>
> Because he replaced the safe percpu_counter_read_positive() call to
> unsafe percpu_counter_sum_positive() in
> sk_sockets_allocated_read_positive()
>
> But anyway the patch I sent should fix the problem.
Thinking again, I am not sure why Glauber did this change. He probably
made a typo or something.
Maybe we should revert and use the following patch instead, since we
dont need a precise sum at this point.
This better matches "sk_sockets_allocated_read_positive" intent anyway
diff --git a/include/net/sock.h b/include/net/sock.h
index 188532e..5a0a58a 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1129,9 +1129,9 @@ sk_sockets_allocated_read_positive(struct sock *sk)
struct proto *prot = sk->sk_prot;
if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
- return percpu_counter_sum_positive(sk->sk_cgrp->sockets_allocated);
+ return percpu_counter_read_positive(sk->sk_cgrp->sockets_allocated);
- return percpu_counter_sum_positive(prot->sockets_allocated);
+ return percpu_counter_read_positive(prot->sockets_allocated);
}
static inline int
^ permalink raw reply related
* Re: inconsistent lock/deadlock crash, vanilla 3.3.4, 32bit, tcp
From: Denys Fedoryshchenko @ 2012-04-29 8:46 UTC (permalink / raw)
To: Eric Dumazet
Cc: netdev, davem, jmorris, yoshfuji, kaber, linux-kernel,
Glauber Costa
In-Reply-To: <1335688036.2900.80.camel@edumazet-glaptop>
On 2012-04-29 11:27, Eric Dumazet wrote:
>
>
> OK, so when we have memory pressure we can call
> percpu_counter_read_positive() with SOFTIRQ enabled, and lockdep
> complains...
>
> This bug was probably added in 2008, in commit 1748376b6626a
> (net: Use a percpu_counter for sockets_allocated)
>
> I'll have to backport the following patch, can you test it please ?
>
> Thanks !
>
> net/core/sock.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/net/core/sock.c b/net/core/sock.c
> index b2e14c0..08fc929 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1818,7 +1818,9 @@ int __sk_mem_schedule(struct sock *sk, int
> size, int kind)
>
> if (!sk_under_memory_pressure(sk))
> return 1;
> + local_bh_disable();
> alloc = sk_sockets_allocated_read_positive(sk);
> + local_bh_enable();
> if (sk_prot_mem_limits(sk, 2) > alloc *
> sk_mem_pages(sk->sk_wmem_queued +
> atomic_read(&sk->sk_rmem_alloc) +
I will try to deploy it and test as soon as someone will powercycle
server on other side.
^ permalink raw reply
* [PATCH 04/15] batman-adv: Distributed ARP Table - create DHT helper functions
From: Antonio Quartulli @ 2012-04-29 8:57 UTC (permalink / raw)
To: davem; +Cc: netdev, b.a.t.m.a.n, Antonio Quartulli
In-Reply-To: <1335689867-8017-1-git-send-email-ordex@autistici.org>
Add all the relevant functions in order to manage a Distributed Hash Table over
the B.A.T.M.A.N.-adv network. It will later be used to store several ARP entries
and implement DAT (Distributed ARP Table)
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
---
net/batman-adv/Makefile | 1 +
net/batman-adv/distributed-arp-table.c | 207 ++++++++++++++++++++++++++++++++
net/batman-adv/distributed-arp-table.h | 66 ++++++++++
net/batman-adv/hard-interface.c | 3 +
net/batman-adv/main.h | 9 +-
net/batman-adv/originator.c | 2 +
net/batman-adv/types.h | 14 +++
net/batman-adv/unicast.c | 8 +-
net/batman-adv/unicast.h | 4 +
9 files changed, 307 insertions(+), 7 deletions(-)
create mode 100644 net/batman-adv/distributed-arp-table.c
create mode 100644 net/batman-adv/distributed-arp-table.h
diff --git a/net/batman-adv/Makefile b/net/batman-adv/Makefile
index 6d5c194..84955b3 100644
--- a/net/batman-adv/Makefile
+++ b/net/batman-adv/Makefile
@@ -24,6 +24,7 @@ batman-adv-y += bat_iv_ogm.o
batman-adv-y += bat_sysfs.o
batman-adv-y += bitarray.o
batman-adv-$(CONFIG_BATMAN_ADV_BLA) += bridge_loop_avoidance.o
+batman-adv-y += distributed-arp-table.o
batman-adv-y += gateway_client.o
batman-adv-y += gateway_common.o
batman-adv-y += hard-interface.o
diff --git a/net/batman-adv/distributed-arp-table.c b/net/batman-adv/distributed-arp-table.c
new file mode 100644
index 0000000..0078478
--- /dev/null
+++ b/net/batman-adv/distributed-arp-table.c
@@ -0,0 +1,207 @@
+/*
+ * Copyright (C) 2011 B.A.T.M.A.N. contributors:
+ *
+ * Antonio Quartulli
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#include <linux/if_ether.h>
+#include <linux/if_arp.h>
+
+#include "main.h"
+#include "distributed-arp-table.h"
+#include "hard-interface.h"
+#include "originator.h"
+#include "send.h"
+#include "types.h"
+#include "unicast.h"
+
+static bool is_orig_node_eligible(struct dht_candidate *res, int select,
+ dat_addr_t tmp_max, dat_addr_t max,
+ dat_addr_t last_max,
+ struct orig_node *candidate,
+ struct orig_node *max_orig_node)
+{
+ bool ret = false;
+ int j;
+
+ /* Check if we have already selected this neighbour... */
+ for (j = 0; j < select; j++)
+ if (res[j].orig_node == candidate)
+ break;
+ /* ..and possibly skip it */
+ if (j < select)
+ goto out;
+ /* sanity check: has it already been selected? This should not happen */
+ if (tmp_max > last_max)
+ goto out;
+ /* check if during this iteration we have already found an originator
+ * with a closer dht address
+ */
+ if (tmp_max < max)
+ goto out;
+ /* this is an hash collision with the temporary selected node. Choose
+ * the one with the lowest address
+ */
+ if ((tmp_max == max) &&
+ (compare_eth(candidate->orig, max_orig_node->orig) > 0))
+ goto out;
+
+ ret = true;
+out:
+ return ret;
+}
+
+/* selects the next candidate by populating cands[select] and modifies last_max
+ * accordingly
+ */
+static void choose_next_candidate(struct bat_priv *bat_priv,
+ struct dht_candidate *cands, int select,
+ dat_addr_t ip_key, dat_addr_t *last_max)
+{
+ dat_addr_t max = 0, tmp_max = 0;
+ struct orig_node *orig_node, *max_orig_node = NULL;
+ struct hashtable_t *hash = bat_priv->orig_hash;
+ struct hlist_node *node;
+ struct hlist_head *head;
+ int i;
+
+ /* if no node is eligible as candidate, we will leave the candidate as
+ * NOT_FOUND
+ */
+ cands[select].type = DHT_CANDIDATE_NOT_FOUND;
+
+ /* iterate over the originator list and find the node with closest
+ * dht_address which has not been selected yet
+ */
+ for (i = 0; i < hash->size; i++) {
+ head = &hash->table[i];
+
+ rcu_read_lock();
+ hlist_for_each_entry_rcu(orig_node, node, head, hash_entry) {
+ /* the dht space is a ring and addresses are unsigned */
+ tmp_max = DAT_ADDR_MAX - orig_node->dht_addr + ip_key;
+
+ if (!is_orig_node_eligible(cands, select, tmp_max, max,
+ *last_max, orig_node,
+ max_orig_node))
+ continue;
+
+ if (!atomic_inc_not_zero(&orig_node->refcount))
+ continue;
+
+ max = tmp_max;
+ if (max_orig_node)
+ orig_node_free_ref(max_orig_node);
+ max_orig_node = orig_node;
+ }
+ rcu_read_unlock();
+ }
+ if (max_orig_node) {
+ cands[select].type = DHT_CANDIDATE_ORIG;
+ cands[select].orig_node = max_orig_node;
+ bat_dbg(DBG_DAT, bat_priv,
+ "dht_select_candidates() %d: selected %pM addr=%u dist=%u\n",
+ select, max_orig_node->orig, max_orig_node->dht_addr,
+ max);
+ }
+ *last_max = max;
+}
+
+/* Given a key, selects the candidates which the DHT message has to be sent to.
+ * An originator O is selected if and only if its DHT_ID value is one of three
+ * closest values (from the LEFT, with wrap around if needed) then the hash
+ * value of the key. ip_dst is the key.
+ *
+ * return an array of size DHT_CANDIDATES_NUM
+ */
+static struct dht_candidate *dht_select_candidates(struct bat_priv *bat_priv,
+ uint32_t ip_dst)
+{
+ int select;
+ dat_addr_t last_max = DAT_ADDR_MAX, ip_key;
+ struct dht_candidate *res;
+
+ if (!bat_priv->orig_hash)
+ return NULL;
+
+ res = kmalloc(DHT_CANDIDATES_NUM * sizeof(*res), GFP_ATOMIC);
+ if (!res)
+ return NULL;
+
+ ip_key = (dat_addr_t)hash_ipv4(&ip_dst, DAT_ADDR_MAX);
+
+ bat_dbg(DBG_DAT, bat_priv,
+ "dht_select_candidates(): IP=%pI4 hash(IP)=%u\n", &ip_dst,
+ ip_key);
+
+ for (select = 0; select < DHT_CANDIDATES_NUM; select++)
+ choose_next_candidate(bat_priv, res, select, ip_key, &last_max);
+
+ return res;
+}
+
+/* Sends the skb payload passed as argument to the candidates selected for
+ * the data represented by 'ip'. The skb is copied by means of pskb_copy()
+ * and is sent as unicast packet to each of the selected candidate.
+ *
+ * If the packet is successfully sent to at least one candidate, then this
+ * function returns true
+ */
+static bool dht_send_data(struct bat_priv *bat_priv, struct sk_buff *skb,
+ uint32_t ip, int packet_subtype)
+{
+ int i;
+ bool ret = false;
+ struct neigh_node *neigh_node = NULL;
+ struct sk_buff *tmp_skb;
+ struct dht_candidate *cand = dht_select_candidates(bat_priv, ip);
+
+ if (!cand)
+ goto out;
+
+ bat_dbg(DBG_DAT, bat_priv, "DHT_SEND for %pI4\n", &ip);
+
+ for (i = 0; i < DHT_CANDIDATES_NUM; i++) {
+ if (cand[i].type == DHT_CANDIDATE_NOT_FOUND)
+ continue;
+
+ neigh_node = orig_node_get_router(cand[i].orig_node);
+ if (!neigh_node)
+ goto free_orig;
+
+ tmp_skb = pskb_copy(skb, GFP_ATOMIC);
+ if (!prepare_unicast_4addr_packet(bat_priv, tmp_skb,
+ cand[i].orig_node,
+ packet_subtype)) {
+ kfree_skb(tmp_skb);
+ goto free_neigh;
+ }
+ if (send_skb_packet(tmp_skb, neigh_node->if_incoming,
+ neigh_node->addr) == NET_XMIT_SUCCESS)
+ /* packet sent to a candidate: we can return true */
+ ret = true;
+free_neigh:
+ neigh_node_free_ref(neigh_node);
+free_orig:
+ orig_node_free_ref(cand[i].orig_node);
+ }
+
+out:
+ kfree(cand);
+ return ret;
+}
diff --git a/net/batman-adv/distributed-arp-table.h b/net/batman-adv/distributed-arp-table.h
new file mode 100644
index 0000000..2a99763
--- /dev/null
+++ b/net/batman-adv/distributed-arp-table.h
@@ -0,0 +1,66 @@
+/*
+ * Copyright (C) 2011 B.A.T.M.A.N. contributors:
+ *
+ * Antonio Quartulli
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#ifndef _NET_BATMAN_ADV_ARP_H_
+#define _NET_BATMAN_ADV_ARP_H_
+
+#include "types.h"
+#include "originator.h"
+
+#define DAT_ADDR_MAX ((dat_addr_t)~(dat_addr_t)0)
+
+/* hash function to choose an entry in a hash table of given size.
+ * hash algorithm from http://en.wikipedia.org/wiki/Hash_table
+ */
+static inline uint32_t hash_ipv4(const void *data, uint32_t size)
+{
+ const unsigned char *key = data;
+ uint32_t hash = 0;
+ size_t i;
+
+ for (i = 0; i < 4; i++) {
+ hash += key[i];
+ hash += (hash << 10);
+ hash ^= (hash >> 6);
+ }
+
+ hash += (hash << 3);
+ hash ^= (hash >> 11);
+ hash += (hash << 15);
+
+ return hash % size;
+}
+
+static inline void dat_init_orig_node_dht_addr(struct orig_node *orig_node)
+{
+ orig_node->dht_addr = (dat_addr_t)choose_orig(orig_node->orig,
+ DAT_ADDR_MAX);
+}
+
+static inline void dat_init_own_dht_addr(struct bat_priv *bat_priv,
+ struct hard_iface *primary_if)
+{
+ bat_priv->dht_addr = (dat_addr_t)
+ choose_orig(primary_if->net_dev->dev_addr,
+ DAT_ADDR_MAX);
+}
+
+#endif /* _NET_BATMAN_ADV_ARP_H_ */
diff --git a/net/batman-adv/hard-interface.c b/net/batman-adv/hard-interface.c
index 220cb02..1393939 100644
--- a/net/batman-adv/hard-interface.c
+++ b/net/batman-adv/hard-interface.c
@@ -20,6 +20,7 @@
*/
#include "main.h"
+#include "distributed-arp-table.h"
#include "hard-interface.h"
#include "soft-interface.h"
#include "send.h"
@@ -118,6 +119,8 @@ static void primary_if_update_addr(struct bat_priv *bat_priv,
if (!primary_if)
goto out;
+ dat_init_own_dht_addr(bat_priv, primary_if);
+
vis_packet = (struct vis_packet *)
bat_priv->my_vis_info->skb_packet->data;
memcpy(vis_packet->vis_orig, primary_if->net_dev->dev_addr, ETH_ALEN);
diff --git a/net/batman-adv/main.h b/net/batman-adv/main.h
index d9ef4ca..10e5efb 100644
--- a/net/batman-adv/main.h
+++ b/net/batman-adv/main.h
@@ -67,6 +67,9 @@
#define NUM_WORDS BITS_TO_LONGS(TQ_LOCAL_WINDOW_SIZE)
+/* numbers of originator to contact for any PUT/GET DHT operation */
+#define DHT_CANDIDATES_NUM 3
+
#define LOG_BUF_LEN 8192 /* has to be a power of 2 */
#define VIS_INTERVAL 5000 /* 5 seconds */
@@ -111,6 +114,9 @@ enum uev_type {
#define GW_THRESHOLD 50
+#define DHT_CANDIDATE_NOT_FOUND 0
+#define DHT_CANDIDATE_ORIG 1
+
/* Debug Messages */
#ifdef pr_fmt
#undef pr_fmt
@@ -221,9 +227,6 @@ static inline bool has_timed_out(unsigned long timestamp, unsigned int timeout)
/* Returns the smallest signed integer in two's complement with the sizeof x */
#define smallest_signed_int(x) (1u << (7u + 8u * (sizeof(x) - 1u)))
-/* Returns the biggest unsigned integer with the sizeof x */
-#define biggest_unsigned_int(x) (~(x)0)
-
/* Checks if a sequence number x is a predecessor/successor of y.
* they handle overflows/underflows and can correctly check for a
* predecessor/successor unless the variable sequence number has grown by
diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c
index ce49698..31d7b58 100644
--- a/net/batman-adv/originator.c
+++ b/net/batman-adv/originator.c
@@ -20,6 +20,7 @@
*/
#include "main.h"
+#include "distributed-arp-table.h"
#include "originator.h"
#include "hash.h"
#include "translation-table.h"
@@ -224,6 +225,7 @@ struct orig_node *get_orig_node(struct bat_priv *bat_priv, const uint8_t *addr)
orig_node->tt_poss_change = false;
orig_node->bat_priv = bat_priv;
memcpy(orig_node->orig, addr, ETH_ALEN);
+ dat_init_orig_node_dht_addr(orig_node);
orig_node->router = NULL;
orig_node->tt_crc = 0;
atomic_set(&orig_node->last_ttvn, 0);
diff --git a/net/batman-adv/types.h b/net/batman-adv/types.h
index 2f4848b..61e2d78 100644
--- a/net/batman-adv/types.h
+++ b/net/batman-adv/types.h
@@ -27,6 +27,13 @@
#include "packet.h"
#include "bitarray.h"
+/* dat_addr_t is the type used for all DHT addresses. If it is changed,
+ * DAT_ADDR_MAX is changed as well.
+ *
+ * *Please be careful: dat_addr_t must be UNSIGNED*
+ */
+#define dat_addr_t uint16_t
+
#define BAT_HEADER_LEN (ETH_HLEN + \
((sizeof(struct unicast_packet) > sizeof(struct bcast_packet) ? \
sizeof(struct unicast_packet) : \
@@ -67,6 +74,7 @@ struct hard_iface {
struct orig_node {
uint8_t orig[ETH_ALEN];
uint8_t primary_addr[ETH_ALEN];
+ dat_addr_t dht_addr;
struct neigh_node __rcu *router; /* rcu protected pointer */
unsigned long *bcast_own;
uint8_t *bcast_own_sum;
@@ -221,6 +229,7 @@ struct bat_priv {
struct gw_node __rcu *curr_gw; /* rcu protected pointer */
atomic_t gw_reselect;
struct hard_iface __rcu *primary_if; /* rcu protected pointer */
+ dat_addr_t dht_addr;
struct vis_info *my_vis_info;
struct bat_algo_ops *bat_algo_ops;
};
@@ -395,4 +404,9 @@ struct bat_algo_ops {
struct sk_buff *skb);
};
+struct dht_candidate {
+ int type;
+ struct orig_node *orig_node;
+};
+
#endif /* _NET_BATMAN_ADV_TYPES_H_ */
diff --git a/net/batman-adv/unicast.c b/net/batman-adv/unicast.c
index 96c7cec..b7152c6 100644
--- a/net/batman-adv/unicast.c
+++ b/net/batman-adv/unicast.c
@@ -313,10 +313,10 @@ static bool prepare_unicast_packet(struct sk_buff *skb,
orig_node);
}
-static bool prepare_unicast_4addr_packet(struct bat_priv *bat_priv,
- struct sk_buff *skb,
- struct orig_node *orig_node,
- int packet_subtype)
+bool prepare_unicast_4addr_packet(struct bat_priv *bat_priv,
+ struct sk_buff *skb,
+ struct orig_node *orig_node,
+ int packet_subtype)
{
struct hard_iface *primary_if;
struct unicast_4addr_packet *unicast_4addr_packet;
diff --git a/net/batman-adv/unicast.h b/net/batman-adv/unicast.h
index ae9775b..e15aa62 100644
--- a/net/batman-adv/unicast.h
+++ b/net/batman-adv/unicast.h
@@ -32,6 +32,10 @@ int frag_reassemble_skb(struct sk_buff *skb, struct bat_priv *bat_priv,
void frag_list_free(struct list_head *head);
int frag_send_skb(struct sk_buff *skb, struct bat_priv *bat_priv,
struct hard_iface *hard_iface, const uint8_t dstaddr[]);
+bool prepare_unicast_4addr_packet(struct bat_priv *bat_priv,
+ struct sk_buff *skb,
+ struct orig_node *orig_node,
+ int packet_subtype);
int unicast_generic_send_skb(struct sk_buff *skb, struct bat_priv *bat_priv,
int packet_type, int packet_subtype);
--
1.7.9.4
^ permalink raw reply related
* [PATCH 05/15] batman-adv: Distributed ARP Table - add ARP parsing functions
From: Antonio Quartulli @ 2012-04-29 8:57 UTC (permalink / raw)
To: davem; +Cc: netdev, b.a.t.m.a.n, Antonio Quartulli
In-Reply-To: <1335689867-8017-1-git-send-email-ordex@autistici.org>
ARP messages are now parsed to make it possible to trigger special actions
depending on their types (snooping).
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
---
net/batman-adv/distributed-arp-table.c | 123 ++++++++++++++++++++++++++++++++
net/batman-adv/distributed-arp-table.h | 10 +++
net/batman-adv/packet.h | 5 +-
3 files changed, 137 insertions(+), 1 deletion(-)
diff --git a/net/batman-adv/distributed-arp-table.c b/net/batman-adv/distributed-arp-table.c
index 0078478..682fb1e 100644
--- a/net/batman-adv/distributed-arp-table.c
+++ b/net/batman-adv/distributed-arp-table.c
@@ -30,6 +30,75 @@
#include "types.h"
#include "unicast.h"
+#ifdef CONFIG_BATMAN_ADV_DEBUG
+
+static void bat_dbg_arp(struct bat_priv *bat_priv, struct sk_buff *skb,
+ uint16_t type, int hdr_size, char *msg)
+{
+ struct unicast_4addr_packet *unicast_4addr_packet;
+
+ if (msg)
+ bat_dbg(DBG_DAT, bat_priv, "%s\n", msg);
+
+ bat_dbg(DBG_DAT, bat_priv, "ARP MSG = [src: %pM-%pI4 dst: %pM-%pI4]\n",
+ ARP_HW_SRC(skb, hdr_size), &ARP_IP_SRC(skb, hdr_size),
+ ARP_HW_DST(skb, hdr_size), &ARP_IP_DST(skb, hdr_size));
+
+ if (hdr_size == 0)
+ return;
+
+ /* if the AP packet is encapsulated in a batman packet, let's print some
+ * debug messages
+ */
+ unicast_4addr_packet = (struct unicast_4addr_packet *)skb->data;
+
+ switch (unicast_4addr_packet->u.header.packet_type) {
+ case BAT_UNICAST:
+ bat_dbg(DBG_DAT, bat_priv,
+ "* encapsulated within a UNICAST packet\n");
+ break;
+ case BAT_UNICAST_4ADDR:
+ bat_dbg(DBG_DAT, bat_priv,
+ "* encapsulated within a UNICAST_4ADDR packet (src: %pM)\n",
+ unicast_4addr_packet->src);
+ switch (unicast_4addr_packet->subtype) {
+ case BAT_P_DAT_DHT_PUT:
+ bat_dbg(DBG_DAT, bat_priv, "* type: DAT_DHT_PUT\n");
+ break;
+ case BAT_P_DAT_DHT_GET:
+ bat_dbg(DBG_DAT, bat_priv, "* type: DAT_DHT_GET\n");
+ break;
+ case BAT_P_DAT_CACHE_REPLY:
+ bat_dbg(DBG_DAT, bat_priv, "* type: DAT_CACHE_REPLY\n");
+ break;
+ case BAT_P_DATA:
+ bat_dbg(DBG_DAT, bat_priv, "* type: DATA\n");
+ break;
+ default:
+ bat_dbg(DBG_DAT, bat_priv, "* type: Unknown!\n");
+ }
+ break;
+ case BAT_BCAST:
+ bat_dbg(DBG_DAT, bat_priv,
+ "* encapsulated within a BCAST packet (src: %pM)\n",
+ ((struct bcast_packet *)unicast_4addr_packet)->orig);
+ break;
+ default:
+ bat_dbg(DBG_DAT, bat_priv,
+ "* encapsulated within an unknown packet type (0x%x)\n",
+ unicast_4addr_packet->u.header.packet_type);
+ }
+}
+
+#else
+
+static void bat_dbg_arp(struct bat_priv *bat_priv, struct sk_buff *skb,
+ uint16_t type, int hdr_size, char *msg)
+{
+}
+
+#endif /* CONFIG_BATMAN_ADV_DEBUG */
+
static bool is_orig_node_eligible(struct dht_candidate *res, int select,
dat_addr_t tmp_max, dat_addr_t max,
dat_addr_t last_max,
@@ -205,3 +274,57 @@ out:
kfree(cand);
return ret;
}
+
+/* Returns arphdr->ar_op if the skb contains a valid ARP packet, otherwise
+ * returns 0
+ */
+static uint16_t arp_get_type(struct bat_priv *bat_priv, struct sk_buff *skb,
+ int hdr_size)
+{
+ struct arphdr *arphdr;
+ struct ethhdr *ethhdr;
+ uint32_t ip_src, ip_dst;
+ uint16_t type = 0;
+
+ /* pull the ethernet header */
+ if (unlikely(!pskb_may_pull(skb, hdr_size + ETH_HLEN)))
+ goto out;
+
+ ethhdr = (struct ethhdr *)(skb->data + hdr_size);
+
+ if (ethhdr->h_proto != htons(ETH_P_ARP))
+ goto out;
+
+ /* pull the ARP payload */
+ if (unlikely(!pskb_may_pull(skb, hdr_size + ETH_HLEN +
+ arp_hdr_len(skb->dev))))
+ goto out;
+
+ arphdr = (struct arphdr *)(skb->data + hdr_size + ETH_HLEN);
+
+ /* Check whether the ARP packet carries a valid
+ * IP information */
+ if (arphdr->ar_hrd != htons(ARPHRD_ETHER))
+ goto out;
+
+ if (arphdr->ar_pro != htons(ETH_P_IP))
+ goto out;
+
+ if (arphdr->ar_hln != ETH_ALEN)
+ goto out;
+
+ if (arphdr->ar_pln != 4)
+ goto out;
+
+ /* Check for bad reply/request. If the ARP message is not sane, DAT
+ * will simply ignore it */
+ ip_src = ARP_IP_SRC(skb, hdr_size);
+ ip_dst = ARP_IP_DST(skb, hdr_size);
+ if (ipv4_is_loopback(ip_src) || ipv4_is_multicast(ip_src) ||
+ ipv4_is_loopback(ip_dst) || ipv4_is_multicast(ip_dst))
+ goto out;
+
+ type = ntohs(arphdr->ar_op);
+out:
+ return type;
+}
diff --git a/net/batman-adv/distributed-arp-table.h b/net/batman-adv/distributed-arp-table.h
index 2a99763..1840438 100644
--- a/net/batman-adv/distributed-arp-table.h
+++ b/net/batman-adv/distributed-arp-table.h
@@ -25,8 +25,18 @@
#include "types.h"
#include "originator.h"
+#include <linux/if_arp.h>
+
#define DAT_ADDR_MAX ((dat_addr_t)~(dat_addr_t)0)
+#define ARP_HW_SRC(skb, hdr_size) ((uint8_t *)(skb->data + hdr_size) + \
+ ETH_HLEN + sizeof(struct arphdr))
+#define ARP_IP_SRC(skb, hdr_size) (*(uint32_t *)(ARP_HW_SRC(skb, hdr_size) + \
+ ETH_ALEN))
+#define ARP_HW_DST(skb, hdr_size) (ARP_HW_SRC(skb, hdr_size) + ETH_ALEN + 4)
+#define ARP_IP_DST(skb, hdr_size) (*(uint32_t *)(ARP_HW_SRC(skb, hdr_size) + \
+ ETH_ALEN * 2 + 4))
+
/* hash function to choose an entry in a hash table of given size.
* hash algorithm from http://en.wikipedia.org/wiki/Hash_table
*/
diff --git a/net/batman-adv/packet.h b/net/batman-adv/packet.h
index e9f5184..c4533c4 100644
--- a/net/batman-adv/packet.h
+++ b/net/batman-adv/packet.h
@@ -37,7 +37,10 @@ enum bat_packettype {
};
enum bat_subtype {
- BAT_P_DATA = 0x01
+ BAT_P_DATA = 0x01,
+ BAT_P_DAT_DHT_GET = 0x02,
+ BAT_P_DAT_DHT_PUT = 0x03,
+ BAT_P_DAT_CACHE_REPLY = 0x04
};
/* this file is included by batctl which needs these defines */
--
1.7.9.4
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox