* Re: [Bugme-new] [Bug 34322] New: No ECN marking in IPv6
From: Eric Dumazet @ 2011-05-06 15:04 UTC (permalink / raw)
To: Andrew Morton
Cc: netdev, bugzilla-daemon, bugme-daemon, sgunderson,
YOSHIFUJI Hideaki
In-Reply-To: <20110505144146.4bc1e744.akpm@linux-foundation.org>
Le jeudi 05 mai 2011 à 14:41 -0700, Andrew Morton a écrit :
> (switched to email. Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> On Tue, 3 May 2011 20:05:00 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
>
> > https://bugzilla.kernel.org/show_bug.cgi?id=34322
> >
> > Summary: No ECN marking in IPv6
> > Product: Networking
> > Version: 2.5
> > Kernel Version: 2.6.38.4
> > Platform: All
> > OS/Version: Linux
> > Tree: Mainline
> > Status: NEW
> > Severity: normal
> > Priority: P1
> > Component: IPV6
> > AssignedTo: yoshfuji@linux-ipv6.org
> > ReportedBy: sgunderson@bigfoot.com
> > Regression: No
> >
> >
> > Hi,
> >
> > I'm completely unable to get ECN to work for IPv6. /proc/sys/net/ipv4/tcp_ecn
> > is set to 1 on both sides, and ECN works just fine for IPv4 TCP connections,
> > but when I connect over IPv6 tclass just stays at 0x0, and as far as I'd
> > understood, there should be two bits here set to 10 (like in the diffserv field
> > of IPv4), right?
> >
> > I do get ECN-echo bits in the TCP header, though (for the initial SYN and
> > SYN/ACK packets).
> >
>
> and
>
> > I looked at the code, and although I don't understand all of it, it
> > looks like e9df2e8fd8fbc95c57dbd1d33dada66c4627b44c (which also brought
> > ECN for IPv6 SCTP) might be the culprit. It seems like TCP_ECN_send()
> > calls INET_ECN_xmit(), which only sets the ECN bit in the IPv4 ToS
> > field (inet_sk(sk)->tos), but after the patch, what's checked is
> > inet6_sk(sk)->tclass, which is a completely different field.
> >
> > Is the analysis correct? Should the tclass be set as well?
>
> and
>
> > OK, as a quick hack, I did this:
> >
> > --- a/linux-2.6.38.5/include/net/inet_ecn.h 2011-04-14 22:03:56.000000000+0200
> > +++ b/linux-2.6.38.5/include/net/inet_ecn.h 2011-05-04 00:36:52.803377902+0200
> > @@ -38,7 +38,7 @@
> > return outer;
> > }
> >
> > -#define INET_ECN_xmit(sk) do { inet_sk(sk)->tos |= INET_ECN_ECT_0; } while (0)
> > +#define INET_ECN_xmit(sk) do { inet_sk(sk)->tos |= INET_ECN_ECT_0; if (inet6_sk(sk) != NULL) inet6_sk(sk)->tclass |= INET_ECN_ECT_0; } while (0)
> > #define INET_ECN_dontxmit(sk) \
> > do { inet_sk(sk)->tos &= ~INET_ECN_MASK; } while (0)
> >
> > and now my packets are properly marked with tclass 0x02 (ie.,
> > signalling ECN-capable transport, no congestion experienced yet).
> >
> > I guess this isn't the right way of doing it, but at least it confirms
> > that the lack of setting tclass is part of the problem.
>
Cc YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Hello Steinar
Analysis seems fine, but you also need to change INET_ECN_dontxmit() for
retransmitted packets.
Any chance you can refine your patch ?
Thanks !
^ permalink raw reply
* Re: [RFC v2] virtio: add virtio-over-PCI driver
From: Ira W. Snyder @ 2011-05-06 16:06 UTC (permalink / raw)
To: Kushwaha Prabhakar-B32579
Cc: Zang Roy-R61911, Gala Kumar-B11780, Gupta Maneesh-B18878,
Aggrwal Poonam-B10812, Kalra Ashish-B00888,
linux-kernel@vger.kernel.org, linuxppc-dev@ozlabs.org,
netdev@vger.kernel.org
In-Reply-To: <071A08F2C6A57E4E94D980ECA553F8741A54D4@039-SN1MPN1-004.039d.mgd.msft.net>
On Fri, May 06, 2011 at 12:00:34PM +0000, Kushwaha Prabhakar-B32579 wrote:
> Hi,
>
> I want to use this patch as base patch for "FSL 85xx platform" to support PCIe Agent.
> The work looks to be little old now. So wanted to understand if any development has happened further on it.
>
> In case no, I would take this work forward for PCIe Agent.
>
> Any help/suggestions are most appreciated in this regard.
>
Hi Prabhakar,
I use PCI agent mode on an mpc8349emds board. All of the important setup is
done very early in the boot process, by U-Boot. Search the U-Boot source
for CONFIG_PCISLAVE. I hunch that the setup needed for 85xx boards are
similar.
This virtio-over-PCI work is now very old. It was intended to provide a
communication mechanism between a PCI Master and many PCI Agents (slaves).
Dave Miller (networking maintainer) suggested to use virtio for this so
that many different devices could be used. Such as:
- network interface
- serial port (for serial console)
I am aware of other ongoing work in this area. Specifically, some ARM
developers are working on a virtio API using their message registers. This
work is much newer, and will be a much better starting place for you.
Search the virtualization mailing list for:
"[PATCH 00/02] virtio: Virtio platform driver"
Here is a link to some of their code:
http://www.spinics.net/lists/linux-sh/msg07188.html
I am currently using a custom driver to provide a network device on my PCI
agents. Searching the mailing list archives for "PCINet", you will find
early versions of the driver. I am happy to provide you a current copy. It
does not use virtio at all, and is unlikely to be accepted into mainline
Linux.
I am happy to provide any of my code if you think it would help you get
started. Specifically, the current version of "PCINet" show how to use the
DMA controller in order to get good network performance. I am also happy to
help port code to 83xx, as well as test on 83xx. Please ask any questions
you may have.
I have people ask about this code about once every two months. There is
plenty of interest in a mainline Linux solution to this problem. :) I
will be moving to 85xx someday, and I hope there is an accepted mainline
solution by then.
I hope it helps,
Ira
> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Ira Snyder
> Sent: Friday, 27 February, 2009 3:19 AM
> To: Arnd Bergmann
> Cc: linux-kernel@vger.kernel.org; Rusty Russell; Jan-Bernd Themann; linuxppc-dev@ozlabs.org; netdev@vger.kernel.org
> Subject: Re: [RFC v2] virtio: add virtio-over-PCI driver
>
> On Thu, Feb 26, 2009 at 09:37:14PM +0100, Arnd Bergmann wrote:
> > On Thursday 26 February 2009, Ira Snyder wrote:
> > > On Thu, Feb 26, 2009 at 05:15:27PM +0100, Arnd Bergmann wrote:
> > >
> > > I think so too. I was just getting something working, and thought it
> > > would be better to have it "out there" rather than be working on it
> > > forever. I'll try to break things up as I have time.
> >
> > Ok, perfect!
> >
> > > For the "libraries", would you suggest breaking things into seperate
> > > code files, and using EXPORT_SYMBOL_GPL()? I'm not very familiar
> > > with doing that, I've mostly been writing code within the existing
> > > device driver frameworks. Or do I need export symbol at all? I'm not sure...
> >
> > You have both options. When you list each file as a separate module in
> > the Makefile, you use EXPORT_SYMBOL_GPL to mark functions that get
> > called by dependent modules, but this will work only in one way.
> >
> > You can also link multiple files together into one module, although it
> > is less common to link a single source file into multiple modules.
> >
>
> Ok. I'm more familiar with the EXPORT_SYMBOL_GPL interface, so I'll do that. If we decide it sucks later, we'll change it.
>
> > > I always thought you were supposed to use packed for data structures
> > > that are external to the system. I purposely designed the structures
> > > so they wouldn't need padding.
> >
> > That would only make sense for structures that are explicitly
> > unaligned, like a register layout using
> >
> > struct my_registers {
> > __le16 first;
> > __le32 second __attribute__((packed));
> > __le16 third;
> > };
> >
> > Even here, I'd recommend listing the individual members as packed
> > rather than the entire struct. Obviously if you layout the members in
> > a sane way, you don't need either.
> >
>
> Ok. I'll drop the __attribute__((packed)) and make sure there aren't problems. I don't suspect any, though.
>
> > > I mostly don't need it. In fact, the only place I'm using registers
> > > not specific to the messaging unit is in the probe routine, where I
> > > setup the 1GB window into host memory and setting up access to the
> > > guest memory on the PCI bus.
> >
> > You could add the registers you need for this to the "reg" property of
> > your device, to be mapped with of_iomap.
> >
> > If the registers for setting up this window don't logically fit into
> > the same device as the one you already use, the cleanest solution
> > would be to have another device just for this and then make a function
> > call into that driver to set up the window.
> >
>
> The registers are part of the board control registers. They don't fit at all in the message unit. Doing this in the bootloader seems like a logical place, but that would require any testers to flash a new U-Boot image into their mpc8349emds boards.
>
> The first set of access is used to set up a 1GB region in the memory map that accesses the host's memory. Any reads/writes to addresses 0x80000000-0xc0000000 actually hit the host's memory.
>
> The last access sets up PCI BAR1 to hit the memory from dma_alloc_coherent(). The bootloader already sets up the window as 16K, it just doesn't point it anywhere. Maybe this /should/ go into the bootloader. Like above, it would require testers to flash a new U-Boot image into their mpc8349emds boards.
>
> > > Now, I wouldn't need to access these registers at all if the
> > > bootloader could handle it. I just don't know if it is possible to
> > > have Linux not use some memory that the bootloader allocated, other
> > > than with the mem=XXX trick, which I'm sure wouldn't be acceptable.
> > > I've just used regular RAM so this is portable to my custom board
> > > (mpc8349emds based) and a regular mpc8349emds. I didn't want to
> > > change anything board specific.
> > >
> > > I would love to have the bootloader allocate (or reserve somewhere
> > > in the memory map) 16K of RAM, and not be required to allocate it
> > > with dma_alloc_coherent(). It would save me plenty of headaches.
> >
> > I believe you can do that through the "memory" devices in the device
> > tree, by leaving out a small part of the description of main memory,
> > at putting it into the "reg" property of your own device.
> >
>
> I'll explore this option. I didn't even know you could do this. Is a driver that requires the trick acceptable for mainline inclusion? Just like setting up the 16K PCI window, this is very platform specific.
>
> This limits the guest driver to systems which are able to change Linux's view of their memory somehow. Maybe this isn't a problem.
>
> > > Code complexity only. Also, it was easier to write 80-char lines
> > > with something like:
> > >
> > > vop_get_desc(vq, idx, &desc);
> > > if (desc.flags & VOP_DESC_F_NEXT) {
> > > /* do something */
> > > }
> > >
> > > Instead of:
> > > if (le16_to_cpu(vq->desc[idx].flags) & VOP_DESC_F_NEXT) {
> > > /* do something */
> > > }
> > >
> > > Plus, I didn't have to remember how many bits were in each field. I
> > > just thought it made everything simpler to understand. Suggestions?
> >
> > hmm, in this particular case, you could change the definition of
> > VOP_DESC_F_NEXT to
> >
> > #define VOP_DESC_F_NEXT cpu_to_le16(1)
> >
> > and then do the code as the even simpler (source and object code wise)
> >
> > if (vq->desc[idx].flags) & VOP_DESC_F_NEXT)
> >
> > I'm not sure if you can do something along these lines for the other
> > cases as well though.
> >
>
> That's a good idea. It wouldn't fix the addresses, lengths, and next fields, though. I'll make the change and see how bad it is, then report back. It may not be so bad after all.
>
> > > I used 3 so they would would align to 1024 byte boundaries within a
> > > 4K page. Then the layout was 16K on the bus, each 4K page is a
> > > single virtio-device, and each 1K block is a single virtqueue. The
> > > first 1K is for virtio-device status and feature bits, etc.
> > >
> > > Packing them differently isn't a problem. It was just easier to code
> > > because setting up a window with the correct size is so platform
> > > specific.
> >
> > Ok. I guess the important question is what part of the code makes this
> > decision. Ideally, the virtio-net glue would instantiate the device
> > with the right number of queues.
> >
>
> Yeah, virtio doesn't work that way.
>
> The virtio drivers just call find_vq() with a different index for each queue they want to use. You have no way of knowing how many queues each virtio driver will want, unless you go read their source code.
>
> virtio-net currently uses 3 queues, but we only support the first two.
> The third is optional (for now...), and non-symmetric.
>
> Thanks again,
> Ira
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
>
>
^ permalink raw reply
* Re: [PATCH 0/2] wireless: Make and use const struct ieee80211_channel
From: Joe Perches @ 2011-05-06 16:10 UTC (permalink / raw)
To: John W. Linville
Cc: libertas-dev, linux-wireless, orinoco-users, orinoco-devel,
netdev, LKML
In-Reply-To: <20110506131922.GA2252@tuxdriver.com>
On Fri, 2011-05-06 at 09:19 -0400, John W. Linville wrote:
> On Thu, May 05, 2011 at 03:21:47PM -0700, Joe Perches wrote:
> > On Thu, 2011-05-05 at 14:49 -0400, John W. Linville wrote:
> > > These patches generated a lot of warnings in net/mac80211. Did you
> > > actually build them?
> > Yes.
> > Did you apply patch 1/2 first?
> > It's a dependent patch.
> That's the one that cause most of the warnings...
Consider the 2 patches as a single patch.
Do you have new build warnings after applying both
patches 1 and 2?
^ permalink raw reply
* Re: ARM, AF_PACKET: caching problems on Marvell Kirkwood
From: Phil Sutter @ 2011-05-06 16:12 UTC (permalink / raw)
To: Eric Dumazet
Cc: linux-arm-kernel, netdev, ne, Johann Baudy, Lennert Buytenhek,
Nicolas Pitre
In-Reply-To: <1304607362.3032.84.camel@edumazet-laptop>
Hi,
On Thu, May 05, 2011 at 04:56:02PM +0200, Eric Dumazet wrote:
> I assume you use latest linux-2.6 or net-next-2.6 ?
Well, initially we noticed the problem on 2.6.34.7, but I verified it
against both 2.6.37 and linux-2.6 from three days ago.
> Could you try to force vmalloc() use ?
>
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index b5362e9..0b5a89c 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -2383,7 +2383,7 @@ static inline char *alloc_one_pg_vec_page(unsigned long order)
> gfp_t gfp_flags = GFP_KERNEL | __GFP_COMP |
> __GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY;
>
> - buffer = (char *) __get_free_pages(gfp_flags, order);
> + buffer = NULL;
>
> if (buffer)
> return buffer;
Thanks for the hint. I tried that, but the problem persists.
Greetings, Phil
^ permalink raw reply
* Re: [PATCH] tcp_cubic: limit delayed_ack ratio to prevent divide error
From: TB @ 2011-05-06 16:15 UTC (permalink / raw)
To: Brandeburg, Jesse
Cc: Stephen Hemminger, David Miller, Sangtae Ha, Injong Rhee,
Valdis.Kletnieks@vt.edu, rdunlap@xenotime.net,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <alpine.WNT.2.00.1105041352020.6048@JBRANDEB-DESK2.amr.corp.intel.com>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 11-05-04 04:53 PM, Brandeburg, Jesse wrote:
>
>
> On Wed, 4 May 2011, Stephen Hemminger wrote:
>
>> TCP Cubic keeps a metric that estimates the amount of delayed
>> acknowledgements to use in adjusting the window. If an abnormally
>> large number of packets are acknowledged at once, then the update
>> could wrap and reach zero. This kind of ACK could only
>> happen when there was a large window and huge number of
>> ACK's were lost.
>>
>> This patch limits the value of delayed ack ratio. The choice of 32
>> is just a conservative value since normally it should be range of
>> 1 to 4 packets.
>>
>> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>
> patch seems fine, but please credit the reporter (lkml@techboom.com) with
> reporting the issue with logs, maybe even with Reported-by: and some kind
> of reference to the panic message or the email thread in the text or
> header?
We're currently testing the patch on 6 production servers
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iQEcBAEBAgAGBQJNxB6yAAoJENOh8x1aI8Ye4ocH/3+6gjWWppgOwql0J4XGGD5R
wJX+u8A+YK2V+GBvxFgQs/qNa3IB/nnWwELolflO80twq2JrOq1I6g2n1VJhHjX4
b5jyROMe2gPHRKESibi84gNIuoImq4bqM/S1u7xWzcikTh8FxCevYQXTNilIKOOf
siuOIypFY7AyqSPjhq5/+HpTrrOQa097PAcVAr8RBO7niyrxAE75ACTolGAKBfvQ
HlOYKmxBT8SbnZ7YJNINopPdtpqz3iaraKWUoT44Wuv8Q8jt0cqB7YJWl0RG/C3y
ABK50Qihl1p6M+LL9jjR2YwVFkjiLyN3fO8g2pjVfn4wh0afFCyWtitN0OFd/4I=
=Vy5E
-----END PGP SIGNATURE-----
^ permalink raw reply
* Re: ARM, AF_PACKET: caching problems on Marvell Kirkwood
From: Phil Sutter @ 2011-05-06 16:17 UTC (permalink / raw)
To: Andrew Lunn
Cc: linux-arm-kernel, netdev, ne, Johann Baudy, Lennert Buytenhek,
Nicolas Pitre
In-Reply-To: <20110505194601.GA10565@lunn.ch>
Hi,
On Thu, May 05, 2011 at 09:46:01PM +0200, Andrew Lunn wrote:
> I can reproduce it on a Kirkwood:
>
> [ 0.000000] CPU: Feroceon 88FR131 [56251311] revision 1 (ARMv5TE), cr=00053977
Thanks for the information. Seems like we have the same CPU:
| [ 0.000000] CPU: Feroceon 88FR131 [56251311] revision 1 (ARMv5TE), cr=00053177
| [ 0.000000] CPU: VIVT data cache, VIVT instruction cache
and it's actually VIVT, not VIPT as I wrote in an earlier mail.
Greetings, Phil
^ permalink raw reply
* [PATCH] NET: slip, fix ldisc->open retval
From: Matvejchikov Ilya @ 2011-05-06 16:23 UTC (permalink / raw)
To: netdev
TTY layer expects 0 if the ldisc->open operation succeeded.
Signed-off-by : Matvejchikov Ilya <matvejchikov@gmail.com>
---
drivers/net/slip.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/drivers/net/slip.c b/drivers/net/slip.c
index 86cbb9e..8ec1a9a 100644
--- a/drivers/net/slip.c
+++ b/drivers/net/slip.c
@@ -853,7 +853,9 @@ static int slip_open(struct tty_struct *tty)
/* Done. We have linked the TTY line to a channel. */
rtnl_unlock();
tty->receive_room = 65536; /* We don't flow control */
- return sl->dev->base_addr;
+
+ /* TTY layer expects 0 on success */
+ return 0;
err_free_bufs:
sl_free_bufs(sl);
--
1.7.5.1
^ permalink raw reply related
* Re: [PATCH] tcp_cubic: limit delayed_ack ratio to prevent divide error
From: Stephen Hemminger @ 2011-05-06 16:53 UTC (permalink / raw)
To: TB
Cc: Brandeburg, Jesse, David Miller, Sangtae Ha, Injong Rhee,
Valdis.Kletnieks@vt.edu, rdunlap@xenotime.net,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <4DC41EB2.6070404@techboom.com>
On Fri, 06 May 2011 12:15:46 -0400
TB <lkml@techboom.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 11-05-04 04:53 PM, Brandeburg, Jesse wrote:
> >
> >
> > On Wed, 4 May 2011, Stephen Hemminger wrote:
> >
> >> TCP Cubic keeps a metric that estimates the amount of delayed
> >> acknowledgements to use in adjusting the window. If an abnormally
> >> large number of packets are acknowledged at once, then the update
> >> could wrap and reach zero. This kind of ACK could only
> >> happen when there was a large window and huge number of
> >> ACK's were lost.
> >>
> >> This patch limits the value of delayed ack ratio. The choice of 32
> >> is just a conservative value since normally it should be range of
> >> 1 to 4 packets.
> >>
> >> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> >
> > patch seems fine, but please credit the reporter (lkml@techboom.com) with
> > reporting the issue with logs, maybe even with Reported-by: and some kind
> > of reference to the panic message or the email thread in the text or
> > header?
>
> We're currently testing the patch on 6 production servers
Thank you, is there some regularity to the failures previously?
^ permalink raw reply
* For the netdev list
From: Tom Goetz @ 2011-05-06 16:57 UTC (permalink / raw)
To: netdev
[-- Attachment #1: Type: text/plain, Size: 569 bytes --]
We recently obtained a new a Lenovo Edge 0578-CTO. The r8169 driver causes instability in the system on this machine. The problem is that in rtl8169_rx_interrupt (status & 0x00001FFF) returns values less than four on this machine. This results in this line:
int pkt_size = (status & 0x00001FFF) - 4;
giving a huge packet size which causes problem when the packet is copied. For a work around we've added a patch to drop packets when we see this condition. I have attached lspci -vvv for this device and a patch for the work around we're using.
-Tom Goetz
[-- Attachment #2: r8168_lspci.log --]
[-- Type: application/octet-stream, Size: 3293 bytes --]
09:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 03)
Subsystem: Lenovo Device 2131
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 299
Region 0: I/O ports at 4000 [size=256]
Region 2: Memory at f0904000 (64-bit, prefetchable) [size=4K]
Region 4: Memory at f0900000 (64-bit, prefetchable) [size=16K]
[virtual] Expansion ROM at f0920000 [disabled] [size=128K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee0200c Data: 41c8
Capabilities: [70] Express (v2) Endpoint, MSI 01
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <512ns, L1 <64us
ClockPM+ Surprise- LLActRep- BwNot-
LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis+
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB
Capabilities: [ac] MSI-X: Enable- Count=4 Masked-
Vector table: BAR=4 offset=00000000
PBA: BAR=4 offset=00000800
Capabilities: [cc] Vital Product Data
Unknown small resource type 00, will not decode more.
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [140 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
Capabilities: [160 v1] Device Serial Number d4-00-00-00-68-4c-e0-00
Kernel driver in use: r8169
Kernel modules: r8169
[-- Attachment #3: rtl8169.patch --]
[-- Type: application/octet-stream, Size: 652 bytes --]
diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 7ffdb80..4c8ad2a 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -4579,6 +4579,14 @@ static int rtl8169_rx_interrupt(struct net_device *dev,
dma_addr_t addr = le64_to_cpu(desc->addr);
int pkt_size = (status & 0x00001FFF) - 4;
+ if ((status & 0x00001FFF) < 4) {
+ dev->stats.rx_dropped++;
+ dev->stats.rx_length_errors++;
+ rtl8169_mark_to_asic(desc, rx_buf_sz);
+ printk("%s: bad packet length!\n", __FUNCTION__);
+ continue;
+ }
+
/*
* The driver does not support incoming fragmented
* frames. They are seen as a symptom of over-mtu
^ permalink raw reply related
* Re: [RFC v3 02/10] Revert "lsm: Remove the socket_post_accept() hook"
From: Paul Moore @ 2011-05-06 17:27 UTC (permalink / raw)
To: Samir Bellabes
Cc: Tetsuo Handa, linux-security-module, linux-kernel, netdev,
netfilter-devel, hadi, kaber, zbr, root
In-Reply-To: <87iptop4di.fsf@synack.fr>
On Friday, May 06, 2011 5:25:45 AM Samir Bellabes wrote:
> the main argument for socket_post_accept is to known informations of the
> remote inet.
>
> from socket_accept(), we have no clue of who (inet->daddr and inet->saddr)
> is connecting to the local service. with socket_post_accept(), inet->daddr
> and inet->saddr are filled with the true distant informations.
>
> This informations is interesting for next security operations on the
> socket. (we known with who we are talking to).
Looking at the snet_socket_post_accept() hook, I believe all of the
information you are looking for should be available to you in the sock_graft()
hook.
--
paul moore
linux @ hp
^ permalink raw reply
* Re: [PATCH] tcp_cubic: limit delayed_ack ratio to prevent divide error
From: TB @ 2011-05-06 17:39 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Brandeburg, Jesse, David Miller, Sangtae Ha, Injong Rhee,
Valdis.Kletnieks@vt.edu, rdunlap@xenotime.net,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20110506095359.57c4fb38@nehalam>
On 11-05-06 12:53 PM, Stephen Hemminger wrote:
> On Fri, 06 May 2011 12:15:46 -0400
> TB <lkml@techboom.com> wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> On 11-05-04 04:53 PM, Brandeburg, Jesse wrote:
>>>
>>>
>>> On Wed, 4 May 2011, Stephen Hemminger wrote:
>>>
>>>> TCP Cubic keeps a metric that estimates the amount of delayed
>>>> acknowledgements to use in adjusting the window. If an abnormally
>>>> large number of packets are acknowledged at once, then the update
>>>> could wrap and reach zero. This kind of ACK could only
>>>> happen when there was a large window and huge number of
>>>> ACK's were lost.
>>>>
>>>> This patch limits the value of delayed ack ratio. The choice of 32
>>>> is just a conservative value since normally it should be range of
>>>> 1 to 4 packets.
>>>>
>>>> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>>>
>>> patch seems fine, but please credit the reporter (lkml@techboom.com) with
>>> reporting the issue with logs, maybe even with Reported-by: and some kind
>>> of reference to the panic message or the email thread in the text or
>>> header?
>>
>> We're currently testing the patch on 6 production servers
>
> Thank you, is there some regularity to the failures previously?
Not really, there was more chance of it happening after a reboot and
during the night (when there is less traffic) for some weird reason.
As a workaround we switched most of the servers to reno
^ permalink raw reply
* Re: [Bugme-new] [Bug 34322] New: No ECN marking in IPv6
From: Steinar H. Gunderson @ 2011-05-06 17:12 UTC (permalink / raw)
To: Eric Dumazet
Cc: Andrew Morton, netdev, bugzilla-daemon, bugme-daemon,
YOSHIFUJI Hideaki
In-Reply-To: <1304694292.3066.29.camel@edumazet-laptop>
On Fri, May 06, 2011 at 05:04:52PM +0200, Eric Dumazet wrote:
> Analysis seems fine, but you also need to change INET_ECN_dontxmit() for
> retransmitted packets.
>
> Any chance you can refine your patch ?
Sure, but is really checking against NULL the right way of checking for IPv6
sockets? I'd imagined I should have checked address family or something
instead...
/* Steinar */
--
Homepage: http://www.sesse.net/
^ permalink raw reply
* [PATCH] bonding: convert to ndo_fix_features
From: Michał Mirosław @ 2011-05-06 17:56 UTC (permalink / raw)
To: netdev; +Cc: Jay Vosburgh, Andy Gospodarek
This should also fix updating of vlan_features and propagating changes to
VLAN devices on the bond.
Side effect: it allows user to force-disable some offloads on the bond
interface.
Note: NETIF_F_VLAN_CHALLENGED is managed by bond_fix_features() now.
BTW, What are the problems in creating VLAN devices on an empty bond
(as stated in one of bond_setup() comments)?
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
Note: This is only compile tested, yet.
drivers/net/bonding/bond_main.c | 133 +++++++++++++++------------------------
1 files changed, 50 insertions(+), 83 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 9a5feaf..04a2205 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -344,32 +344,6 @@ out:
}
/**
- * bond_has_challenged_slaves
- * @bond: the bond we're working on
- *
- * Searches the slave list. Returns 1 if a vlan challenged slave
- * was found, 0 otherwise.
- *
- * Assumes bond->lock is held.
- */
-static int bond_has_challenged_slaves(struct bonding *bond)
-{
- struct slave *slave;
- int i;
-
- bond_for_each_slave(bond, slave, i) {
- if (slave->dev->features & NETIF_F_VLAN_CHALLENGED) {
- pr_debug("found VLAN challenged slave - %s\n",
- slave->dev->name);
- return 1;
- }
- }
-
- pr_debug("no VLAN challenged slaves found\n");
- return 0;
-}
-
-/**
* bond_next_vlan - safely skip to the next item in the vlans list.
* @bond: the bond we're working on
* @curr: item we're advancing from
@@ -1406,52 +1380,61 @@ static int bond_sethwaddr(struct net_device *bond_dev,
return 0;
}
-#define BOND_VLAN_FEATURES \
- (NETIF_F_VLAN_CHALLENGED | NETIF_F_HW_VLAN_RX | NETIF_F_HW_VLAN_TX | \
- NETIF_F_HW_VLAN_FILTER)
-
-/*
- * Compute the common dev->feature set available to all slaves. Some
- * feature bits are managed elsewhere, so preserve those feature bits
- * on the master device.
- */
-static int bond_compute_features(struct bonding *bond)
+static u32 bond_fix_features(struct net_device *dev, u32 features)
{
struct slave *slave;
- struct net_device *bond_dev = bond->dev;
- u32 features = bond_dev->features;
- u32 vlan_features = 0;
- unsigned short max_hard_header_len = max((u16)ETH_HLEN,
- bond_dev->hard_header_len);
+ struct bonding *bond = netdev_priv(dev);
+ u32 mask;
int i;
- features &= ~(NETIF_F_ALL_CSUM | BOND_VLAN_FEATURES);
- features |= NETIF_F_GSO_MASK | NETIF_F_NO_CSUM | NETIF_F_NOCACHE_COPY;
-
if (!bond->first_slave)
- goto done;
+ /* Disable adding VLANs to empty bond. But why? --mq */
+ return features | NETIF_F_VLAN_CHALLENGED;
+ mask = features;
features &= ~NETIF_F_ONE_FOR_ALL;
+ features |= NETIF_F_ALL_FOR_ALL;
- vlan_features = bond->first_slave->dev->vlan_features;
bond_for_each_slave(bond, slave, i) {
features = netdev_increment_features(features,
slave->dev->features,
- NETIF_F_ONE_FOR_ALL);
+ mask);
+ }
+
+ return features;
+}
+
+#define BOND_VLAN_FEATURES (NETIF_F_ALL_TX_OFFLOADS | \
+ NETIF_F_SOFT_FEATURES | \
+ NETIF_F_LRO)
+
+static void bond_compute_features(struct bonding *bond)
+{
+ struct slave *slave;
+ struct net_device *bond_dev = bond->dev;
+ u32 old_features, vlan_features = BOND_VLAN_FEATURES;
+ unsigned short max_hard_header_len = ETH_HLEN;
+ int i;
+
+ if (!bond->first_slave)
+ goto done;
+
+ bond_for_each_slave(bond, slave, i) {
vlan_features = netdev_increment_features(vlan_features,
- slave->dev->vlan_features,
- NETIF_F_ONE_FOR_ALL);
+ slave->dev->vlan_features, BOND_VLAN_FEATURES);
+
if (slave->dev->hard_header_len > max_hard_header_len)
max_hard_header_len = slave->dev->hard_header_len;
}
done:
- features |= (bond_dev->features & BOND_VLAN_FEATURES);
- bond_dev->features = netdev_fix_features(bond_dev, features);
- bond_dev->vlan_features = netdev_fix_features(bond_dev, vlan_features);
+ bond_dev->vlan_features = vlan_features;
bond_dev->hard_header_len = max_hard_header_len;
- return 0;
+ old_features = bond_dev->features;
+ netdev_update_features(bond_dev);
+ if (old_features == bond_dev->features)
+ netdev_features_change(bond_dev);
}
static void bond_setup_by_slave(struct net_device *bond_dev,
@@ -1544,7 +1527,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
struct netdev_hw_addr *ha;
struct sockaddr addr;
int link_reporting;
- int old_features = bond_dev->features;
int res = 0;
if (!bond->params.use_carrier && slave_dev->ethtool_ops == NULL &&
@@ -1577,16 +1559,9 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
pr_warning("%s: Warning: enslaved VLAN challenged slave %s. Adding VLANs will be blocked as long as %s is part of bond %s\n",
bond_dev->name, slave_dev->name,
slave_dev->name, bond_dev->name);
- bond_dev->features |= NETIF_F_VLAN_CHALLENGED;
}
} else {
pr_debug("%s: ! NETIF_F_VLAN_CHALLENGED\n", slave_dev->name);
- if (bond->slave_cnt == 0) {
- /* First slave, and it is not VLAN challenged,
- * so remove the block of adding VLANs over the bond.
- */
- bond_dev->features &= ~NETIF_F_VLAN_CHALLENGED;
- }
}
/*
@@ -1958,7 +1933,7 @@ err_free:
kfree(new_slave);
err_undo_flags:
- bond_dev->features = old_features;
+ bond_compute_features(bond);
return res;
}
@@ -1979,6 +1954,7 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev)
struct bonding *bond = netdev_priv(bond_dev);
struct slave *slave, *oldcurrent;
struct sockaddr addr;
+ u32 old_features = bond_dev->features;
/* slave is not a slave or master is not master of this slave */
if (!(slave_dev->flags & IFF_SLAVE) ||
@@ -2084,19 +2060,16 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev)
*/
memset(bond_dev->dev_addr, 0, bond_dev->addr_len);
- if (!bond->vlgrp) {
- bond_dev->features |= NETIF_F_VLAN_CHALLENGED;
- } else {
+ if (bond->vlgrp) {
pr_warning("%s: Warning: clearing HW address of %s while it still has VLANs.\n",
bond_dev->name, bond_dev->name);
pr_warning("%s: When re-adding slaves, make sure the bond's HW address matches its VLANs'.\n",
bond_dev->name);
}
- } else if ((bond_dev->features & NETIF_F_VLAN_CHALLENGED) &&
- !bond_has_challenged_slaves(bond)) {
+ } else if (!(bond_dev->features & NETIF_F_VLAN_CHALLENGED) &&
+ old_features & NETIF_F_VLAN_CHALLENGED) {
pr_info("%s: last VLAN challenged slave %s left bond %s. VLAN blocking is removed\n",
bond_dev->name, slave_dev->name, bond_dev->name);
- bond_dev->features &= ~NETIF_F_VLAN_CHALLENGED;
}
write_unlock_bh(&bond->lock);
@@ -2269,9 +2242,7 @@ static int bond_release_all(struct net_device *bond_dev)
*/
memset(bond_dev->dev_addr, 0, bond_dev->addr_len);
- if (!bond->vlgrp) {
- bond_dev->features |= NETIF_F_VLAN_CHALLENGED;
- } else {
+ if (bond->vlgrp) {
pr_warning("%s: Warning: clearing HW address of %s while it still has VLANs.\n",
bond_dev->name, bond_dev->name);
pr_warning("%s: When re-adding slaves, make sure the bond's HW address matches its VLANs'.\n",
@@ -4347,11 +4318,6 @@ static void bond_ethtool_get_drvinfo(struct net_device *bond_dev,
static const struct ethtool_ops bond_ethtool_ops = {
.get_drvinfo = bond_ethtool_get_drvinfo,
.get_link = ethtool_op_get_link,
- .get_tx_csum = ethtool_op_get_tx_csum,
- .get_sg = ethtool_op_get_sg,
- .get_tso = ethtool_op_get_tso,
- .get_ufo = ethtool_op_get_ufo,
- .get_flags = ethtool_op_get_flags,
};
static const struct net_device_ops bond_netdev_ops = {
@@ -4377,6 +4343,7 @@ static const struct net_device_ops bond_netdev_ops = {
#endif
.ndo_add_slave = bond_enslave,
.ndo_del_slave = bond_release,
+ .ndo_fix_features = bond_fix_features,
};
static void bond_destructor(struct net_device *bond_dev)
@@ -4432,14 +4399,14 @@ static void bond_setup(struct net_device *bond_dev)
* when there are slaves that are not hw accel
* capable
*/
- bond_dev->features |= (NETIF_F_HW_VLAN_TX |
- NETIF_F_HW_VLAN_RX |
- NETIF_F_HW_VLAN_FILTER);
- /* By default, we enable GRO on bonding devices.
- * Actual support requires lowlevel drivers are GRO ready.
- */
- bond_dev->features |= NETIF_F_GRO;
+ bond_dev->hw_features = BOND_VLAN_FEATURES |
+ NETIF_F_HW_VLAN_TX |
+ NETIF_F_HW_VLAN_RX |
+ NETIF_F_HW_VLAN_FILTER;
+
+ bond_dev->hw_features &= ~(NETIF_F_ALL_CSUM & ~NETIF_F_NO_CSUM);
+ bond_dev->features |= bond_dev->hw_features;
}
static void bond_work_cancel_all(struct bonding *bond)
--
1.7.2.5
^ permalink raw reply related
* [PATCH] net: Fix vlan_features propagation
From: Michał Mirosław @ 2011-05-06 17:56 UTC (permalink / raw)
To: netdev; +Cc: Patrick McHardy
Fix VLAN features propagation for devices which change vlan_features.
For this to work, driver needs to make sure netdev_features_changed()
gets called after the change (it is e.g. after ndo_set_features()).
Side effect is that a user might request features that will never
be enabled on a VLAN device.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
net/8021q/vlan_dev.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index d174c31..526159a 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -531,7 +531,7 @@ static int vlan_dev_init(struct net_device *dev)
(1<<__LINK_STATE_DORMANT))) |
(1<<__LINK_STATE_PRESENT);
- dev->hw_features = real_dev->vlan_features & NETIF_F_ALL_TX_OFFLOADS;
+ dev->hw_features = NETIF_F_ALL_TX_OFFLOADS;
dev->features |= real_dev->vlan_features | NETIF_F_LLTX;
dev->gso_max_size = real_dev->gso_max_size;
@@ -590,9 +590,11 @@ static u32 vlan_dev_fix_features(struct net_device *dev, u32 features)
{
struct net_device *real_dev = vlan_dev_info(dev)->real_dev;
- features &= (real_dev->features | NETIF_F_LLTX);
+ features &= real_dev->features;
+ features &= real_dev->vlan_features;
if (dev_ethtool_get_rx_csum(real_dev))
features |= NETIF_F_RXCSUM;
+ features |= NETIF_F_LLTX;
return features;
}
--
1.7.2.5
^ permalink raw reply related
* Re: [PATCH v6 BONUS 4/3] ipv4: Store rtable entries directly in FIB
From: David Miller @ 2011-05-06 17:57 UTC (permalink / raw)
To: ja; +Cc: netdev, tgraf, jpirko, herbert, eric.dumazet
In-Reply-To: <alpine.LFD.2.00.1105060944230.1435@ja.ssi.bg>
From: Julian Anastasov <ja@ssi.bg>
Date: Fri, 6 May 2011 12:12:26 +0300 (EEST)
> Caching results of __mkroute_output in NH does
> not work well for RTN_MULTICAST because ip_check_mc_rcu
> wants to further restrict local delivery depending on
> the source address and protocol.
I understand that multicast needs special handling.
I'm concentrating on unicast/broadcast at the moment because
there is a predominantly clear path for making that work.
^ permalink raw reply
* Re: [PATCH] bonding: convert to ndo_fix_features
From: Jay Vosburgh @ 2011-05-06 18:18 UTC (permalink / raw)
To: Michał Mirosław; +Cc: netdev, Andy Gospodarek
In-Reply-To: <20110506175629.BC59D1389B@rere.qmqm.pl>
Michał Mirosław <mirq-linux@rere.qmqm.pl> wrote:
>This should also fix updating of vlan_features and propagating changes to
>VLAN devices on the bond.
>
>Side effect: it allows user to force-disable some offloads on the bond
>interface.
>
>Note: NETIF_F_VLAN_CHALLENGED is managed by bond_fix_features() now.
>
>BTW, What are the problems in creating VLAN devices on an empty bond
>(as stated in one of bond_setup() comments)?
If there are no slaves, then the bond does not have a MAC
address assigned (because it gets its initial MAC from the first slave).
It's therefore impossible to pass a MAC address up to the VLAN
interface.
So the limitation is that the bond must have at least one slave
before a VLAN may be configured above it.
-J
>Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
>---
>
>Note: This is only compile tested, yet.
>
> drivers/net/bonding/bond_main.c | 133 +++++++++++++++------------------------
> 1 files changed, 50 insertions(+), 83 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 9a5feaf..04a2205 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -344,32 +344,6 @@ out:
> }
>
> /**
>- * bond_has_challenged_slaves
>- * @bond: the bond we're working on
>- *
>- * Searches the slave list. Returns 1 if a vlan challenged slave
>- * was found, 0 otherwise.
>- *
>- * Assumes bond->lock is held.
>- */
>-static int bond_has_challenged_slaves(struct bonding *bond)
>-{
>- struct slave *slave;
>- int i;
>-
>- bond_for_each_slave(bond, slave, i) {
>- if (slave->dev->features & NETIF_F_VLAN_CHALLENGED) {
>- pr_debug("found VLAN challenged slave - %s\n",
>- slave->dev->name);
>- return 1;
>- }
>- }
>-
>- pr_debug("no VLAN challenged slaves found\n");
>- return 0;
>-}
>-
>-/**
> * bond_next_vlan - safely skip to the next item in the vlans list.
> * @bond: the bond we're working on
> * @curr: item we're advancing from
>@@ -1406,52 +1380,61 @@ static int bond_sethwaddr(struct net_device *bond_dev,
> return 0;
> }
>
>-#define BOND_VLAN_FEATURES \
>- (NETIF_F_VLAN_CHALLENGED | NETIF_F_HW_VLAN_RX | NETIF_F_HW_VLAN_TX | \
>- NETIF_F_HW_VLAN_FILTER)
>-
>-/*
>- * Compute the common dev->feature set available to all slaves. Some
>- * feature bits are managed elsewhere, so preserve those feature bits
>- * on the master device.
>- */
>-static int bond_compute_features(struct bonding *bond)
>+static u32 bond_fix_features(struct net_device *dev, u32 features)
> {
> struct slave *slave;
>- struct net_device *bond_dev = bond->dev;
>- u32 features = bond_dev->features;
>- u32 vlan_features = 0;
>- unsigned short max_hard_header_len = max((u16)ETH_HLEN,
>- bond_dev->hard_header_len);
>+ struct bonding *bond = netdev_priv(dev);
>+ u32 mask;
> int i;
>
>- features &= ~(NETIF_F_ALL_CSUM | BOND_VLAN_FEATURES);
>- features |= NETIF_F_GSO_MASK | NETIF_F_NO_CSUM | NETIF_F_NOCACHE_COPY;
>-
> if (!bond->first_slave)
>- goto done;
>+ /* Disable adding VLANs to empty bond. But why? --mq */
>+ return features | NETIF_F_VLAN_CHALLENGED;
>
>+ mask = features;
> features &= ~NETIF_F_ONE_FOR_ALL;
>+ features |= NETIF_F_ALL_FOR_ALL;
>
>- vlan_features = bond->first_slave->dev->vlan_features;
> bond_for_each_slave(bond, slave, i) {
> features = netdev_increment_features(features,
> slave->dev->features,
>- NETIF_F_ONE_FOR_ALL);
>+ mask);
>+ }
>+
>+ return features;
>+}
>+
>+#define BOND_VLAN_FEATURES (NETIF_F_ALL_TX_OFFLOADS | \
>+ NETIF_F_SOFT_FEATURES | \
>+ NETIF_F_LRO)
>+
>+static void bond_compute_features(struct bonding *bond)
>+{
>+ struct slave *slave;
>+ struct net_device *bond_dev = bond->dev;
>+ u32 old_features, vlan_features = BOND_VLAN_FEATURES;
>+ unsigned short max_hard_header_len = ETH_HLEN;
>+ int i;
>+
>+ if (!bond->first_slave)
>+ goto done;
>+
>+ bond_for_each_slave(bond, slave, i) {
> vlan_features = netdev_increment_features(vlan_features,
>- slave->dev->vlan_features,
>- NETIF_F_ONE_FOR_ALL);
>+ slave->dev->vlan_features, BOND_VLAN_FEATURES);
>+
> if (slave->dev->hard_header_len > max_hard_header_len)
> max_hard_header_len = slave->dev->hard_header_len;
> }
>
> done:
>- features |= (bond_dev->features & BOND_VLAN_FEATURES);
>- bond_dev->features = netdev_fix_features(bond_dev, features);
>- bond_dev->vlan_features = netdev_fix_features(bond_dev, vlan_features);
>+ bond_dev->vlan_features = vlan_features;
> bond_dev->hard_header_len = max_hard_header_len;
>
>- return 0;
>+ old_features = bond_dev->features;
>+ netdev_update_features(bond_dev);
>+ if (old_features == bond_dev->features)
>+ netdev_features_change(bond_dev);
> }
>
> static void bond_setup_by_slave(struct net_device *bond_dev,
>@@ -1544,7 +1527,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
> struct netdev_hw_addr *ha;
> struct sockaddr addr;
> int link_reporting;
>- int old_features = bond_dev->features;
> int res = 0;
>
> if (!bond->params.use_carrier && slave_dev->ethtool_ops == NULL &&
>@@ -1577,16 +1559,9 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
> pr_warning("%s: Warning: enslaved VLAN challenged slave %s. Adding VLANs will be blocked as long as %s is part of bond %s\n",
> bond_dev->name, slave_dev->name,
> slave_dev->name, bond_dev->name);
>- bond_dev->features |= NETIF_F_VLAN_CHALLENGED;
> }
> } else {
> pr_debug("%s: ! NETIF_F_VLAN_CHALLENGED\n", slave_dev->name);
>- if (bond->slave_cnt == 0) {
>- /* First slave, and it is not VLAN challenged,
>- * so remove the block of adding VLANs over the bond.
>- */
>- bond_dev->features &= ~NETIF_F_VLAN_CHALLENGED;
>- }
> }
>
> /*
>@@ -1958,7 +1933,7 @@ err_free:
> kfree(new_slave);
>
> err_undo_flags:
>- bond_dev->features = old_features;
>+ bond_compute_features(bond);
>
> return res;
> }
>@@ -1979,6 +1954,7 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev)
> struct bonding *bond = netdev_priv(bond_dev);
> struct slave *slave, *oldcurrent;
> struct sockaddr addr;
>+ u32 old_features = bond_dev->features;
>
> /* slave is not a slave or master is not master of this slave */
> if (!(slave_dev->flags & IFF_SLAVE) ||
>@@ -2084,19 +2060,16 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev)
> */
> memset(bond_dev->dev_addr, 0, bond_dev->addr_len);
>
>- if (!bond->vlgrp) {
>- bond_dev->features |= NETIF_F_VLAN_CHALLENGED;
>- } else {
>+ if (bond->vlgrp) {
> pr_warning("%s: Warning: clearing HW address of %s while it still has VLANs.\n",
> bond_dev->name, bond_dev->name);
> pr_warning("%s: When re-adding slaves, make sure the bond's HW address matches its VLANs'.\n",
> bond_dev->name);
> }
>- } else if ((bond_dev->features & NETIF_F_VLAN_CHALLENGED) &&
>- !bond_has_challenged_slaves(bond)) {
>+ } else if (!(bond_dev->features & NETIF_F_VLAN_CHALLENGED) &&
>+ old_features & NETIF_F_VLAN_CHALLENGED) {
> pr_info("%s: last VLAN challenged slave %s left bond %s. VLAN blocking is removed\n",
> bond_dev->name, slave_dev->name, bond_dev->name);
>- bond_dev->features &= ~NETIF_F_VLAN_CHALLENGED;
> }
>
> write_unlock_bh(&bond->lock);
>@@ -2269,9 +2242,7 @@ static int bond_release_all(struct net_device *bond_dev)
> */
> memset(bond_dev->dev_addr, 0, bond_dev->addr_len);
>
>- if (!bond->vlgrp) {
>- bond_dev->features |= NETIF_F_VLAN_CHALLENGED;
>- } else {
>+ if (bond->vlgrp) {
> pr_warning("%s: Warning: clearing HW address of %s while it still has VLANs.\n",
> bond_dev->name, bond_dev->name);
> pr_warning("%s: When re-adding slaves, make sure the bond's HW address matches its VLANs'.\n",
>@@ -4347,11 +4318,6 @@ static void bond_ethtool_get_drvinfo(struct net_device *bond_dev,
> static const struct ethtool_ops bond_ethtool_ops = {
> .get_drvinfo = bond_ethtool_get_drvinfo,
> .get_link = ethtool_op_get_link,
>- .get_tx_csum = ethtool_op_get_tx_csum,
>- .get_sg = ethtool_op_get_sg,
>- .get_tso = ethtool_op_get_tso,
>- .get_ufo = ethtool_op_get_ufo,
>- .get_flags = ethtool_op_get_flags,
> };
>
> static const struct net_device_ops bond_netdev_ops = {
>@@ -4377,6 +4343,7 @@ static const struct net_device_ops bond_netdev_ops = {
> #endif
> .ndo_add_slave = bond_enslave,
> .ndo_del_slave = bond_release,
>+ .ndo_fix_features = bond_fix_features,
> };
>
> static void bond_destructor(struct net_device *bond_dev)
>@@ -4432,14 +4399,14 @@ static void bond_setup(struct net_device *bond_dev)
> * when there are slaves that are not hw accel
> * capable
> */
>- bond_dev->features |= (NETIF_F_HW_VLAN_TX |
>- NETIF_F_HW_VLAN_RX |
>- NETIF_F_HW_VLAN_FILTER);
>
>- /* By default, we enable GRO on bonding devices.
>- * Actual support requires lowlevel drivers are GRO ready.
>- */
>- bond_dev->features |= NETIF_F_GRO;
>+ bond_dev->hw_features = BOND_VLAN_FEATURES |
>+ NETIF_F_HW_VLAN_TX |
>+ NETIF_F_HW_VLAN_RX |
>+ NETIF_F_HW_VLAN_FILTER;
>+
>+ bond_dev->hw_features &= ~(NETIF_F_ALL_CSUM & ~NETIF_F_NO_CSUM);
>+ bond_dev->features |= bond_dev->hw_features;
> }
>
> static void bond_work_cancel_all(struct bonding *bond)
>--
>1.7.2.5
>
---
-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
^ permalink raw reply
* [PATCH] vmxnet3: Consistently disable irqs when taking adapter->cmd_lock
From: Roland Dreier @ 2011-05-06 18:32 UTC (permalink / raw)
To: Shreyas Bhatewara, David S. Miller; +Cc: pv-drivers, netdev
From: Roland Dreier <roland@purestorage.com>
Using the vmxnet3 driver produces a lockdep warning because
vmxnet3_set_mc(), which is called with mc->mca_lock held, takes
adapter->cmd_lock. However, there are a couple of places where
adapter->cmd_lock is taken with softirqs enabled, lockdep warns that a
softirq that tries to take mc->mca_lock could happen while
adapter->cmd_lock is held, leading to an AB-BA deadlock.
I'm not sure if this is a real potential deadlock or not, but the
simplest and best fix seems to be simply to make sure we take cmd_lock
with spin_lock_irqsave() everywhere -- the places with plain spin_lock
just look like oversights.
The full enormous lockdep warning is:
=========================================================
[ INFO: possible irq lock inversion dependency detected ]
2.6.39-rc6+ #1
---------------------------------------------------------
ifconfig/567 just changed the state of lock:
(&(&mc->mca_lock)->rlock){+.-...}, at: [<ffffffff81531e9f>] mld_ifc_timer_expire+0xff/0x280
but this lock took another, SOFTIRQ-unsafe lock in the past:
(&(&adapter->cmd_lock)->rlock){+.+...}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
4 locks held by ifconfig/567:
#0: (rtnl_mutex){+.+.+.}, at: [<ffffffff8147d547>] rtnl_lock+0x17/0x20
#1: ((inetaddr_chain).rwsem){.+.+.+}, at: [<ffffffff810896cf>] __blocking_notifier_call_chain+0x5f/0xb0
#2: (&idev->mc_ifc_timer){+.-...}, at: [<ffffffff8106f21b>] run_timer_softirq+0xeb/0x3f0
#3: (&ndev->lock){++.-..}, at: [<ffffffff81531dd2>] mld_ifc_timer_expire+0x32/0x280
the shortest dependencies between 2nd lock and 1st lock:
-> (&(&adapter->cmd_lock)->rlock){+.+...} ops: 11 {
HARDIRQ-ON-W at:
[<ffffffff8109ad86>] __lock_acquire+0x7f6/0x1e10
[<ffffffff8109ca4d>] lock_acquire+0x9d/0x130
[<ffffffff81571156>] _raw_spin_lock+0x36/0x70
[<ffffffffa000d212>] vmxnet3_alloc_intr_resources+0x22/0x230 [vmxnet3]
[<ffffffffa0014031>] vmxnet3_probe_device+0x5f6/0x15c5 [vmxnet3]
[<ffffffff812df67f>] local_pci_probe+0x5f/0xd0
[<ffffffff812dfde9>] pci_device_probe+0x119/0x120
[<ffffffff81373df6>] driver_probe_device+0x96/0x1c0
[<ffffffff81373fcb>] __driver_attach+0xab/0xb0
[<ffffffff81372a1e>] bus_for_each_dev+0x5e/0x90
[<ffffffff81373a2e>] driver_attach+0x1e/0x20
[<ffffffff813735b8>] bus_add_driver+0xc8/0x290
[<ffffffff813745b6>] driver_register+0x76/0x140
[<ffffffff812e0046>] __pci_register_driver+0x66/0xe0
[<ffffffffa001b03a>] serio_raw_poll+0x3a/0x60 [serio_raw]
[<ffffffff81002165>] do_one_initcall+0x45/0x190
[<ffffffff810aa76b>] sys_init_module+0xfb/0x250
[<ffffffff8157a142>] system_call_fastpath+0x16/0x1b
SOFTIRQ-ON-W at:
[<ffffffff8109adb7>] __lock_acquire+0x827/0x1e10
[<ffffffff8109ca4d>] lock_acquire+0x9d/0x130
[<ffffffff81571156>] _raw_spin_lock+0x36/0x70
[<ffffffffa000d212>] vmxnet3_alloc_intr_resources+0x22/0x230 [vmxnet3]
[<ffffffffa0014031>] vmxnet3_probe_device+0x5f6/0x15c5 [vmxnet3]
[<ffffffff812df67f>] local_pci_probe+0x5f/0xd0
[<ffffffff812dfde9>] pci_device_probe+0x119/0x120
[<ffffffff81373df6>] driver_probe_device+0x96/0x1c0
[<ffffffff81373fcb>] __driver_attach+0xab/0xb0
[<ffffffff81372a1e>] bus_for_each_dev+0x5e/0x90
[<ffffffff81373a2e>] driver_attach+0x1e/0x20
[<ffffffff813735b8>] bus_add_driver+0xc8/0x290
[<ffffffff813745b6>] driver_register+0x76/0x140
[<ffffffff812e0046>] __pci_register_driver+0x66/0xe0
[<ffffffffa001b03a>] serio_raw_poll+0x3a/0x60 [serio_raw]
[<ffffffff81002165>] do_one_initcall+0x45/0x190
[<ffffffff810aa76b>] sys_init_module+0xfb/0x250
[<ffffffff8157a142>] system_call_fastpath+0x16/0x1b
INITIAL USE at:
[<ffffffff8109a9e9>] __lock_acquire+0x459/0x1e10
[<ffffffff8109ca4d>] lock_acquire+0x9d/0x130
[<ffffffff81571156>] _raw_spin_lock+0x36/0x70
[<ffffffffa000d212>] vmxnet3_alloc_intr_resources+0x22/0x230 [vmxnet3]
[<ffffffffa0014031>] vmxnet3_probe_device+0x5f6/0x15c5 [vmxnet3]
[<ffffffff812df67f>] local_pci_probe+0x5f/0xd0
[<ffffffff812dfde9>] pci_device_probe+0x119/0x120
[<ffffffff81373df6>] driver_probe_device+0x96/0x1c0
[<ffffffff81373fcb>] __driver_attach+0xab/0xb0
[<ffffffff81372a1e>] bus_for_each_dev+0x5e/0x90
[<ffffffff81373a2e>] driver_attach+0x1e/0x20
[<ffffffff813735b8>] bus_add_driver+0xc8/0x290
[<ffffffff813745b6>] driver_register+0x76/0x140
[<ffffffff812e0046>] __pci_register_driver+0x66/0xe0
[<ffffffffa001b03a>] serio_raw_poll+0x3a/0x60 [serio_raw]
[<ffffffff81002165>] do_one_initcall+0x45/0x190
[<ffffffff810aa76b>] sys_init_module+0xfb/0x250
[<ffffffff8157a142>] system_call_fastpath+0x16/0x1b
}
... key at: [<ffffffffa0017590>] __key.42516+0x0/0xffffffffffffda70 [vmxnet3]
... acquired at:
[<ffffffff8109ca4d>] lock_acquire+0x9d/0x130
[<ffffffff81571bb5>] _raw_spin_lock_irqsave+0x55/0xa0
[<ffffffffa000de27>] vmxnet3_set_mc+0x97/0x1a0 [vmxnet3]
[<ffffffff8146ffa0>] __dev_set_rx_mode+0x40/0xb0
[<ffffffff81470040>] dev_set_rx_mode+0x30/0x50
[<ffffffff81470127>] __dev_open+0xc7/0x100
[<ffffffff814703c1>] __dev_change_flags+0xa1/0x180
[<ffffffff81470568>] dev_change_flags+0x28/0x70
[<ffffffff814da960>] devinet_ioctl+0x730/0x800
[<ffffffff814db508>] inet_ioctl+0x88/0xa0
[<ffffffff814541f0>] sock_do_ioctl+0x30/0x70
[<ffffffff814542a9>] sock_ioctl+0x79/0x2f0
[<ffffffff81188798>] do_vfs_ioctl+0x98/0x570
[<ffffffff81188d01>] sys_ioctl+0x91/0xa0
[<ffffffff8157a142>] system_call_fastpath+0x16/0x1b
-> (_xmit_ETHER){+.....} ops: 6 {
HARDIRQ-ON-W at:
[<ffffffff8109ad86>] __lock_acquire+0x7f6/0x1e10
[<ffffffff8109ca4d>] lock_acquire+0x9d/0x130
[<ffffffff8157124b>] _raw_spin_lock_bh+0x3b/0x70
[<ffffffff81475618>] __dev_mc_add+0x38/0x90
[<ffffffff814756a0>] dev_mc_add+0x10/0x20
[<ffffffff81532c9e>] igmp6_group_added+0x10e/0x1b0
[<ffffffff81533f2d>] ipv6_dev_mc_inc+0x2cd/0x430
[<ffffffff81515e17>] ipv6_add_dev+0x357/0x450
[<ffffffff81519f27>] addrconf_notify+0x2f7/0xb10
[<ffffffff81575c1c>] notifier_call_chain+0x8c/0xc0
[<ffffffff81089586>] raw_notifier_call_chain+0x16/0x20
[<ffffffff814689b7>] call_netdevice_notifiers+0x37/0x70
[<ffffffff8146a944>] register_netdevice+0x244/0x2d0
[<ffffffff8146aa0f>] register_netdev+0x3f/0x60
[<ffffffffa001419b>] vmxnet3_probe_device+0x760/0x15c5 [vmxnet3]
[<ffffffff812df67f>] local_pci_probe+0x5f/0xd0
[<ffffffff812dfde9>] pci_device_probe+0x119/0x120
[<ffffffff81373df6>] driver_probe_device+0x96/0x1c0
[<ffffffff81373fcb>] __driver_attach+0xab/0xb0
[<ffffffff81372a1e>] bus_for_each_dev+0x5e/0x90
[<ffffffff81373a2e>] driver_attach+0x1e/0x20
[<ffffffff813735b8>] bus_add_driver+0xc8/0x290
[<ffffffff813745b6>] driver_register+0x76/0x140
[<ffffffff812e0046>] __pci_register_driver+0x66/0xe0
[<ffffffffa001b03a>] serio_raw_poll+0x3a/0x60 [serio_raw]
[<ffffffff81002165>] do_one_initcall+0x45/0x190
[<ffffffff810aa76b>] sys_init_module+0xfb/0x250
[<ffffffff8157a142>] system_call_fastpath+0x16/0x1b
INITIAL USE at:
[<ffffffff8109a9e9>] __lock_acquire+0x459/0x1e10
[<ffffffff8109ca4d>] lock_acquire+0x9d/0x130
[<ffffffff8157124b>] _raw_spin_lock_bh+0x3b/0x70
[<ffffffff81475618>] __dev_mc_add+0x38/0x90
[<ffffffff814756a0>] dev_mc_add+0x10/0x20
[<ffffffff81532c9e>] igmp6_group_added+0x10e/0x1b0
[<ffffffff81533f2d>] ipv6_dev_mc_inc+0x2cd/0x430
[<ffffffff81515e17>] ipv6_add_dev+0x357/0x450
[<ffffffff81519f27>] addrconf_notify+0x2f7/0xb10
[<ffffffff81575c1c>] notifier_call_chain+0x8c/0xc0
[<ffffffff81089586>] raw_notifier_call_chain+0x16/0x20
[<ffffffff814689b7>] call_netdevice_notifiers+0x37/0x70
[<ffffffff8146a944>] register_netdevice+0x244/0x2d0
[<ffffffff8146aa0f>] register_netdev+0x3f/0x60
[<ffffffffa001419b>] vmxnet3_probe_device+0x760/0x15c5 [vmxnet3]
[<ffffffff812df67f>] local_pci_probe+0x5f/0xd0
[<ffffffff812dfde9>] pci_device_probe+0x119/0x120
[<ffffffff81373df6>] driver_probe_device+0x96/0x1c0
[<ffffffff81373fcb>] __driver_attach+0xab/0xb0
[<ffffffff81372a1e>] bus_for_each_dev+0x5e/0x90
[<ffffffff81373a2e>] driver_attach+0x1e/0x20
[<ffffffff813735b8>] bus_add_driver+0xc8/0x290
[<ffffffff813745b6>] driver_register+0x76/0x140
[<ffffffff812e0046>] __pci_register_driver+0x66/0xe0
[<ffffffffa001b03a>] serio_raw_poll+0x3a/0x60 [serio_raw]
[<ffffffff81002165>] do_one_initcall+0x45/0x190
[<ffffffff810aa76b>] sys_init_module+0xfb/0x250
[<ffffffff8157a142>] system_call_fastpath+0x16/0x1b
}
... key at: [<ffffffff827fd868>] netdev_addr_lock_key+0x8/0x1e0
... acquired at:
[<ffffffff8109ca4d>] lock_acquire+0x9d/0x130
[<ffffffff8157124b>] _raw_spin_lock_bh+0x3b/0x70
[<ffffffff81475618>] __dev_mc_add+0x38/0x90
[<ffffffff814756a0>] dev_mc_add+0x10/0x20
[<ffffffff81532c9e>] igmp6_group_added+0x10e/0x1b0
[<ffffffff81533f2d>] ipv6_dev_mc_inc+0x2cd/0x430
[<ffffffff81515e17>] ipv6_add_dev+0x357/0x450
[<ffffffff81519f27>] addrconf_notify+0x2f7/0xb10
[<ffffffff81575c1c>] notifier_call_chain+0x8c/0xc0
[<ffffffff81089586>] raw_notifier_call_chain+0x16/0x20
[<ffffffff814689b7>] call_netdevice_notifiers+0x37/0x70
[<ffffffff8146a944>] register_netdevice+0x244/0x2d0
[<ffffffff8146aa0f>] register_netdev+0x3f/0x60
[<ffffffffa001419b>] vmxnet3_probe_device+0x760/0x15c5 [vmxnet3]
[<ffffffff812df67f>] local_pci_probe+0x5f/0xd0
[<ffffffff812dfde9>] pci_device_probe+0x119/0x120
[<ffffffff81373df6>] driver_probe_device+0x96/0x1c0
[<ffffffff81373fcb>] __driver_attach+0xab/0xb0
[<ffffffff81372a1e>] bus_for_each_dev+0x5e/0x90
[<ffffffff81373a2e>] driver_attach+0x1e/0x20
[<ffffffff813735b8>] bus_add_driver+0xc8/0x290
[<ffffffff813745b6>] driver_register+0x76/0x140
[<ffffffff812e0046>] __pci_register_driver+0x66/0xe0
[<ffffffffa001b03a>] serio_raw_poll+0x3a/0x60 [serio_raw]
[<ffffffff81002165>] do_one_initcall+0x45/0x190
[<ffffffff810aa76b>] sys_init_module+0xfb/0x250
[<ffffffff8157a142>] system_call_fastpath+0x16/0x1b
-> (&(&mc->mca_lock)->rlock){+.-...} ops: 6 {
HARDIRQ-ON-W at:
[<ffffffff8109ad86>] __lock_acquire+0x7f6/0x1e10
[<ffffffff8109ca4d>] lock_acquire+0x9d/0x130
[<ffffffff8157124b>] _raw_spin_lock_bh+0x3b/0x70
[<ffffffff81532bd5>] igmp6_group_added+0x45/0x1b0
[<ffffffff81533f2d>] ipv6_dev_mc_inc+0x2cd/0x430
[<ffffffff81515e17>] ipv6_add_dev+0x357/0x450
[<ffffffff81ce0d16>] addrconf_init+0x4e/0x183
[<ffffffff81ce0ba1>] inet6_init+0x191/0x2a6
[<ffffffff81002165>] do_one_initcall+0x45/0x190
[<ffffffff81ca4d3f>] kernel_init+0xe3/0x168
[<ffffffff8157b2e4>] kernel_thread_helper+0x4/0x10
IN-SOFTIRQ-W at:
[<ffffffff8109ad5e>] __lock_acquire+0x7ce/0x1e10
[<ffffffff8109ca4d>] lock_acquire+0x9d/0x130
[<ffffffff8157124b>] _raw_spin_lock_bh+0x3b/0x70
[<ffffffff81531e9f>] mld_ifc_timer_expire+0xff/0x280
[<ffffffff8106f2a9>] run_timer_softirq+0x179/0x3f0
[<ffffffff810666d0>] __do_softirq+0xc0/0x210
[<ffffffff8157b3dc>] call_softirq+0x1c/0x30
[<ffffffff8100d42d>] do_softirq+0xad/0xe0
[<ffffffff81066afe>] irq_exit+0x9e/0xb0
[<ffffffff8157bd40>] smp_apic_timer_interrupt+0x70/0x9b
[<ffffffff8157ab93>] apic_timer_interrupt+0x13/0x20
[<ffffffff8149d857>] rt_do_flush+0x87/0x2a0
[<ffffffff814a16b6>] rt_cache_flush+0x46/0x60
[<ffffffff814e36e0>] fib_disable_ip+0x40/0x60
[<ffffffff814e5447>] fib_inetaddr_event+0xd7/0xe0
[<ffffffff81575c1c>] notifier_call_chain+0x8c/0xc0
[<ffffffff810896e8>] __blocking_notifier_call_chain+0x78/0xb0
[<ffffffff81089736>] blocking_notifier_call_chain+0x16/0x20
[<ffffffff814d8021>] __inet_del_ifa+0xf1/0x2e0
[<ffffffff814d8223>] inet_del_ifa+0x13/0x20
[<ffffffff814da731>] devinet_ioctl+0x501/0x800
[<ffffffff814db508>] inet_ioctl+0x88/0xa0
[<ffffffff814541f0>] sock_do_ioctl+0x30/0x70
[<ffffffff814542a9>] sock_ioctl+0x79/0x2f0
[<ffffffff81188798>] do_vfs_ioctl+0x98/0x570
[<ffffffff81188d01>] sys_ioctl+0x91/0xa0
[<ffffffff8157a142>] system_call_fastpath+0x16/0x1b
INITIAL USE at:
[<ffffffff8109a9e9>] __lock_acquire+0x459/0x1e10
[<ffffffff8109ca4d>] lock_acquire+0x9d/0x130
[<ffffffff8157124b>] _raw_spin_lock_bh+0x3b/0x70
[<ffffffff81532bd5>] igmp6_group_added+0x45/0x1b0
[<ffffffff81533f2d>] ipv6_dev_mc_inc+0x2cd/0x430
[<ffffffff81515e17>] ipv6_add_dev+0x357/0x450
[<ffffffff81ce0d16>] addrconf_init+0x4e/0x183
[<ffffffff81ce0ba1>] inet6_init+0x191/0x2a6
[<ffffffff81002165>] do_one_initcall+0x45/0x190
[<ffffffff81ca4d3f>] kernel_init+0xe3/0x168
[<ffffffff8157b2e4>] kernel_thread_helper+0x4/0x10
}
... key at: [<ffffffff82801be2>] __key.40877+0x0/0x8
... acquired at:
[<ffffffff810997bc>] check_usage_forwards+0x9c/0x110
[<ffffffff8109a32c>] mark_lock+0x19c/0x400
[<ffffffff8109ad5e>] __lock_acquire+0x7ce/0x1e10
[<ffffffff8109ca4d>] lock_acquire+0x9d/0x130
[<ffffffff8157124b>] _raw_spin_lock_bh+0x3b/0x70
[<ffffffff81531e9f>] mld_ifc_timer_expire+0xff/0x280
[<ffffffff8106f2a9>] run_timer_softirq+0x179/0x3f0
[<ffffffff810666d0>] __do_softirq+0xc0/0x210
[<ffffffff8157b3dc>] call_softirq+0x1c/0x30
[<ffffffff8100d42d>] do_softirq+0xad/0xe0
[<ffffffff81066afe>] irq_exit+0x9e/0xb0
[<ffffffff8157bd40>] smp_apic_timer_interrupt+0x70/0x9b
[<ffffffff8157ab93>] apic_timer_interrupt+0x13/0x20
[<ffffffff8149d857>] rt_do_flush+0x87/0x2a0
[<ffffffff814a16b6>] rt_cache_flush+0x46/0x60
[<ffffffff814e36e0>] fib_disable_ip+0x40/0x60
[<ffffffff814e5447>] fib_inetaddr_event+0xd7/0xe0
[<ffffffff81575c1c>] notifier_call_chain+0x8c/0xc0
[<ffffffff810896e8>] __blocking_notifier_call_chain+0x78/0xb0
[<ffffffff81089736>] blocking_notifier_call_chain+0x16/0x20
[<ffffffff814d8021>] __inet_del_ifa+0xf1/0x2e0
[<ffffffff814d8223>] inet_del_ifa+0x13/0x20
[<ffffffff814da731>] devinet_ioctl+0x501/0x800
[<ffffffff814db508>] inet_ioctl+0x88/0xa0
[<ffffffff814541f0>] sock_do_ioctl+0x30/0x70
[<ffffffff814542a9>] sock_ioctl+0x79/0x2f0
[<ffffffff81188798>] do_vfs_ioctl+0x98/0x570
[<ffffffff81188d01>] sys_ioctl+0x91/0xa0
[<ffffffff8157a142>] system_call_fastpath+0x16/0x1b
stack backtrace:
Pid: 567, comm: ifconfig Not tainted 2.6.39-rc6+ #1
Call Trace:
<IRQ> [<ffffffff810996f6>] print_irq_inversion_bug+0x146/0x170
[<ffffffff81099720>] ? print_irq_inversion_bug+0x170/0x170
[<ffffffff810997bc>] check_usage_forwards+0x9c/0x110
[<ffffffff8109a32c>] mark_lock+0x19c/0x400
[<ffffffff8109ad5e>] __lock_acquire+0x7ce/0x1e10
[<ffffffff8109a383>] ? mark_lock+0x1f3/0x400
[<ffffffff8109b497>] ? __lock_acquire+0xf07/0x1e10
[<ffffffff81012255>] ? native_sched_clock+0x15/0x70
[<ffffffff8109ca4d>] lock_acquire+0x9d/0x130
[<ffffffff81531e9f>] ? mld_ifc_timer_expire+0xff/0x280
[<ffffffff8109759d>] ? lock_release_holdtime+0x3d/0x1a0
[<ffffffff8157124b>] _raw_spin_lock_bh+0x3b/0x70
[<ffffffff81531e9f>] ? mld_ifc_timer_expire+0xff/0x280
[<ffffffff8157170b>] ? _raw_spin_unlock+0x2b/0x40
[<ffffffff81531e9f>] mld_ifc_timer_expire+0xff/0x280
[<ffffffff8106f2a9>] run_timer_softirq+0x179/0x3f0
[<ffffffff8106f21b>] ? run_timer_softirq+0xeb/0x3f0
[<ffffffff810122b9>] ? sched_clock+0x9/0x10
[<ffffffff81531da0>] ? mld_gq_timer_expire+0x30/0x30
[<ffffffff810666d0>] __do_softirq+0xc0/0x210
[<ffffffff8109455f>] ? tick_program_event+0x1f/0x30
[<ffffffff8157b3dc>] call_softirq+0x1c/0x30
[<ffffffff8100d42d>] do_softirq+0xad/0xe0
[<ffffffff81066afe>] irq_exit+0x9e/0xb0
[<ffffffff8157bd40>] smp_apic_timer_interrupt+0x70/0x9b
[<ffffffff8157ab93>] apic_timer_interrupt+0x13/0x20
<EOI> [<ffffffff81571f14>] ? retint_restore_args+0x13/0x13
[<ffffffff810974a7>] ? lock_is_held+0x17/0xd0
[<ffffffff8149d857>] rt_do_flush+0x87/0x2a0
[<ffffffff814a16b6>] rt_cache_flush+0x46/0x60
[<ffffffff814e36e0>] fib_disable_ip+0x40/0x60
[<ffffffff814e5447>] fib_inetaddr_event+0xd7/0xe0
[<ffffffff81575c1c>] notifier_call_chain+0x8c/0xc0
[<ffffffff810896e8>] __blocking_notifier_call_chain+0x78/0xb0
[<ffffffff81089736>] blocking_notifier_call_chain+0x16/0x20
[<ffffffff814d8021>] __inet_del_ifa+0xf1/0x2e0
[<ffffffff814d8223>] inet_del_ifa+0x13/0x20
[<ffffffff814da731>] devinet_ioctl+0x501/0x800
[<ffffffff8108a3af>] ? local_clock+0x6f/0x80
[<ffffffff81575898>] ? do_page_fault+0x268/0x560
[<ffffffff814db508>] inet_ioctl+0x88/0xa0
[<ffffffff814541f0>] sock_do_ioctl+0x30/0x70
[<ffffffff814542a9>] sock_ioctl+0x79/0x2f0
[<ffffffff810dfe87>] ? __call_rcu+0xa7/0x190
[<ffffffff81188798>] do_vfs_ioctl+0x98/0x570
[<ffffffff8117737e>] ? fget_light+0x33e/0x430
[<ffffffff81571ef9>] ? retint_swapgs+0x13/0x1b
[<ffffffff81188d01>] sys_ioctl+0x91/0xa0
[<ffffffff8157a142>] system_call_fastpath+0x16/0x1b
Signed-off-by: Roland Dreier <roland@purestorage.com>
---
drivers/net/vmxnet3/vmxnet3_drv.c | 10 ++++++----
1 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
index 0d47c3a..c16ed96 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -178,6 +178,7 @@ static void
vmxnet3_process_events(struct vmxnet3_adapter *adapter)
{
int i;
+ unsigned long flags;
u32 events = le32_to_cpu(adapter->shared->ecr);
if (!events)
return;
@@ -190,10 +191,10 @@ vmxnet3_process_events(struct vmxnet3_adapter *adapter)
/* Check if there is an error on xmit/recv queues */
if (events & (VMXNET3_ECR_TQERR | VMXNET3_ECR_RQERR)) {
- spin_lock(&adapter->cmd_lock);
+ spin_lock_irqsave(&adapter->cmd_lock, flags);
VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD,
VMXNET3_CMD_GET_QUEUE_STATUS);
- spin_unlock(&adapter->cmd_lock);
+ spin_unlock_irqrestore(&adapter->cmd_lock, flags);
for (i = 0; i < adapter->num_tx_queues; i++)
if (adapter->tqd_start[i].status.stopped)
@@ -2733,13 +2734,14 @@ static void
vmxnet3_alloc_intr_resources(struct vmxnet3_adapter *adapter)
{
u32 cfg;
+ unsigned long flags;
/* intr settings */
- spin_lock(&adapter->cmd_lock);
+ spin_lock_irqsave(&adapter->cmd_lock, flags);
VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD,
VMXNET3_CMD_GET_CONF_INTR);
cfg = VMXNET3_READ_BAR1_REG(adapter, VMXNET3_REG_CMD);
- spin_unlock(&adapter->cmd_lock);
+ spin_unlock_irqrestore(&adapter->cmd_lock, flags);
adapter->intr.type = cfg & 0x3;
adapter->intr.mask_mode = (cfg >> 2) & 0x3;
^ permalink raw reply related
* [PATCH] hamachi: Delete TX checksumming code commented out since 1999
From: David Miller @ 2011-05-06 18:59 UTC (permalink / raw)
To: netdev
TX checksumming support has been ifdef commented out of this driver
for more than 10 years, and it makes references to aspects of the IPv4
stack from back then as well.
If someone has one of these rare cards and wants to properly resurrect
TX checksumming support, they can still get at this code in the
version control history.
Signed-off-by: David S. Miller <davem@davemloft.net>
---
I stumbled over this cruft while auditing ip_queue_xmit() users.
drivers/net/hamachi.c | 79 -------------------------------------------------
1 files changed, 0 insertions(+), 79 deletions(-)
diff --git a/drivers/net/hamachi.c b/drivers/net/hamachi.c
index 80d25ed..f5fba73 100644
--- a/drivers/net/hamachi.c
+++ b/drivers/net/hamachi.c
@@ -132,14 +132,8 @@ static int tx_params[MAX_UNITS] = {-1, -1, -1, -1, -1, -1, -1, -1};
/*
* RX_CHECKSUM turns on card-generated receive checksum generation for
* TCP and UDP packets. Otherwise the upper layers do the calculation.
- * TX_CHECKSUM won't do anything too useful, even if it works. There's no
- * easy mechanism by which to tell the TCP/UDP stack that it need not
- * generate checksums for this device. But if somebody can find a way
- * to get that to work, most of the card work is in here already.
* 3/10/1999 Pete Wyckoff <wyckoff@ca.sandia.gov>
*/
-#undef TX_CHECKSUM
-#define RX_CHECKSUM
/* Operational parameters that usually are not changed. */
/* Time in jiffies before concluding the transmitter is hung. */
@@ -630,11 +624,6 @@ static int __devinit hamachi_init_one (struct pci_dev *pdev,
SET_NETDEV_DEV(dev, &pdev->dev);
-#ifdef TX_CHECKSUM
- printk("check that skbcopy in ip_queue_xmit isn't happening\n");
- dev->hard_header_len += 8; /* for cksum tag */
-#endif
-
for (i = 0; i < 6; i++)
dev->dev_addr[i] = 1 ? read_eeprom(ioaddr, 4 + i)
: readb(ioaddr + StationAddr + i);
@@ -937,11 +926,7 @@ static int hamachi_open(struct net_device *dev)
/* always 1, takes no more time to do it */
writew(0x0001, ioaddr + RxChecksum);
-#ifdef TX_CHECKSUM
- writew(0x0001, ioaddr + TxChecksum);
-#else
writew(0x0000, ioaddr + TxChecksum);
-#endif
writew(0x8000, ioaddr + MACCnfg); /* Soft reset the MAC */
writew(0x215F, ioaddr + MACCnfg);
writew(0x000C, ioaddr + FrameGap0);
@@ -1226,40 +1211,6 @@ static void hamachi_init_ring(struct net_device *dev)
}
-#ifdef TX_CHECKSUM
-#define csum_add(it, val) \
-do { \
- it += (u16) (val); \
- if (it & 0xffff0000) { \
- it &= 0xffff; \
- ++it; \
- } \
-} while (0)
- /* printk("add %04x --> %04x\n", val, it); \ */
-
-/* uh->len already network format, do not swap */
-#define pseudo_csum_udp(sum,ih,uh) do { \
- sum = 0; \
- csum_add(sum, (ih)->saddr >> 16); \
- csum_add(sum, (ih)->saddr & 0xffff); \
- csum_add(sum, (ih)->daddr >> 16); \
- csum_add(sum, (ih)->daddr & 0xffff); \
- csum_add(sum, cpu_to_be16(IPPROTO_UDP)); \
- csum_add(sum, (uh)->len); \
-} while (0)
-
-/* swap len */
-#define pseudo_csum_tcp(sum,ih,len) do { \
- sum = 0; \
- csum_add(sum, (ih)->saddr >> 16); \
- csum_add(sum, (ih)->saddr & 0xffff); \
- csum_add(sum, (ih)->daddr >> 16); \
- csum_add(sum, (ih)->daddr & 0xffff); \
- csum_add(sum, cpu_to_be16(IPPROTO_TCP)); \
- csum_add(sum, htons(len)); \
-} while (0)
-#endif
-
static netdev_tx_t hamachi_start_xmit(struct sk_buff *skb,
struct net_device *dev)
{
@@ -1292,36 +1243,6 @@ static netdev_tx_t hamachi_start_xmit(struct sk_buff *skb,
hmp->tx_skbuff[entry] = skb;
-#ifdef TX_CHECKSUM
- {
- /* tack on checksum tag */
- u32 tagval = 0;
- struct ethhdr *eh = (struct ethhdr *)skb->data;
- if (eh->h_proto == cpu_to_be16(ETH_P_IP)) {
- struct iphdr *ih = (struct iphdr *)((char *)eh + ETH_HLEN);
- if (ih->protocol == IPPROTO_UDP) {
- struct udphdr *uh
- = (struct udphdr *)((char *)ih + ih->ihl*4);
- u32 offset = ((unsigned char *)uh + 6) - skb->data;
- u32 pseudo;
- pseudo_csum_udp(pseudo, ih, uh);
- pseudo = htons(pseudo);
- printk("udp cksum was %04x, sending pseudo %04x\n",
- uh->check, pseudo);
- uh->check = 0; /* zero out uh->check before card calc */
- /*
- * start at 14 (skip ethhdr), store at offset (uh->check),
- * use pseudo value given.
- */
- tagval = (14 << 24) | (offset << 16) | pseudo;
- } else if (ih->protocol == IPPROTO_TCP) {
- printk("tcp, no auto cksum\n");
- }
- }
- *(u32 *)skb_push(skb, 8) = tagval;
- }
-#endif
-
hmp->tx_ring[entry].addr = cpu_to_leXX(pci_map_single(hmp->pci_dev,
skb->data, skb->len, PCI_DMA_TODEVICE));
--
1.7.5.1
^ permalink raw reply related
* Re: [PATCH] vmxnet3: Consistently disable irqs when taking adapter->cmd_lock
From: Shreyas Bhatewara @ 2011-05-06 19:21 UTC (permalink / raw)
To: Roland Dreier
Cc: David S. Miller, pv-drivers@vmware.com, netdev@vger.kernel.org
In-Reply-To: <1304706773-21348-1-git-send-email-roland@kernel.org>
On Fri, 6 May 2011, Roland Dreier wrote:
> From: Roland Dreier <roland@purestorage.com>
>
> Using the vmxnet3 driver produces a lockdep warning because
> Signed-off-by: Roland Dreier <roland@purestorage.com>
Roland, thanks for the analysis and the patch.
Signed-off-by: Shreyas N Bhatewara <sbhatewara@vmware.com>
^ permalink raw reply
* ip_queue_xmit() used illegally
From: David Miller @ 2011-05-06 19:26 UTC (permalink / raw)
To: netdev; +Cc: vladislav.yasevich, yjwei, jchapman
Several users of ip_queue_xmit() use it illegally.
I've only audited L2TP and SCTP so far, and they both cannot use
ip_queue_xmit() with the way they operate currently.
The issue surrounds how the socket binding is maintained in
inet->inet_daddr, inet->inet_saddr etc.
TCP does things right, in that ip_queue_xmit() is only invoked with
inet->inet_daddr and inet->inet_saddr having fully resolved, final,
fully connected values.
This is an absolute requirement because if the socket's route
invalidates (which happens completely asynchronously) it's going to
lookup a new route using whatever is stored in
inet->inet_{daddr,saddr} and then use those addresses to build the
packet. Even if ->inet_{saddr,daddr} are both zero this will still
emit a packet (bonus points if you know what addresses will be picked,
no peeking at route.c :-).
SCTP stores it's binding information using transports and assosciations
and does not fill in the ->inet_{daddr,saddr} values.
It tries to work around this route issue by checking dst->obsolete
directly in sctp_packet_transmit(), which just makes the race smaller
and does not eliminate it. ip_queue_xmit() can still end up with
__sk_dst_check() returning NULL and then we end up emitting a
potentially bogus packet.
L2TP supports more of a datagram type socket semantic than a stream
one, it allows unconnected modes of operation. And for this reason
it also cannot use ip_queue_xmit() legally.
After a quick cursory scan it seem like DCCP is OK.
I think SCTP could potentially be fixed by simply filling in the
inet->inet_{daddr,saddr} values when it makes an internal binding
of the transport via sctp_transport_route().
L2TP on the other hand will need to use another interface to send ipv4
packets because it allows disconnected operation.
^ permalink raw reply
* future developments of usbnet
From: Oliver Neukum @ 2011-05-06 18:45 UTC (permalink / raw)
To: netdev, linux-usb
Hi,
I'd like to get a feeling what people are working out there regarding usbnet.
So please, if you do something, or think something ought to be done, please
speak up now.
IMHO usbnet needs better support for
- batching protocols
- double buffering on the rx path
with the latter having higher priority.
Coments?
Regards
Oliver
^ permalink raw reply
* Re: [PATCH] dccp: handle invalid feature options length
From: Gerrit Renker @ 2011-05-06 19:57 UTC (permalink / raw)
To: Dan Rosenberg; +Cc: davem, dccp, netdev, linux-kernel, security
In-Reply-To: <1304688438.29544.16.camel@dan>
Quoting Dan Rosenberg:
| A length of zero (after subtracting two for the type and len fields) for
| the DCCPO_{CHANGE,CONFIRM}_{L,R} options will cause an underflow due to
| the subtraction. The subsequent code may read past the end of the
| options value buffer when parsing. I'm unsure of what the consequences
| of this might be, but it's probably not good.
|
Please disregard my earlier message, I erred.
Dan is right, his patch is correct and definitively valid.
A length of 0 would be cast to 0xff and then cause buffer overrun.
| Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
| Cc: stable@kernel.org
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
^ permalink raw reply
* Re: [PATCH] dccp: handle invalid feature options length
From: David Miller @ 2011-05-06 20:04 UTC (permalink / raw)
To: gerrit; +Cc: drosenberg, dccp, netdev, linux-kernel, security
In-Reply-To: <20110506195733.GA3527@gerrit.erg.abdn.ac.uk>
From: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Date: Fri, 6 May 2011 21:57:33 +0200
> Quoting Dan Rosenberg:
> | A length of zero (after subtracting two for the type and len fields) for
> | the DCCPO_{CHANGE,CONFIRM}_{L,R} options will cause an underflow due to
> | the subtraction. The subsequent code may read past the end of the
> | options value buffer when parsing. I'm unsure of what the consequences
> | of this might be, but it's probably not good.
> |
> Please disregard my earlier message, I erred.
> Dan is right, his patch is correct and definitively valid.
> A length of 0 would be cast to 0xff and then cause buffer overrun.
>
> | Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
> | Cc: stable@kernel.org
> Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Great, I'll apply this, thanks!
^ permalink raw reply
* RE: [Pv-drivers] [PATCH] vmxnet3: Consistently disable irqs when taking adapter->cmd_lock
From: Scott Goldman @ 2011-05-06 20:10 UTC (permalink / raw)
To: Shreyas Bhatewara, Roland Dreier
Cc: pv-drivers@vmware.com, netdev@vger.kernel.org, David S. Miller
In-Reply-To: <alpine.LRH.2.00.1105061206500.26660@sbhatewara-dev1.eng.vmware.com>
>
> On Fri, 6 May 2011, Roland Dreier wrote:
>
> > From: Roland Dreier <roland@purestorage.com>
> >
> > Using the vmxnet3 driver produces a lockdep warning because
>
> > Signed-off-by: Roland Dreier <roland@purestorage.com>
>
>
> Roland, thanks for the analysis and the patch.
>
> Signed-off-by: Shreyas N Bhatewara <sbhatewara@vmware.com>
Likewise, seems pretty sane to me. The command register operations are only control-path operations and disabling interrupts for the duration is probably not a big deal. Touching the cmd reg will result a VMEXIT, where the guest won't be processing interrupts anyway.
Signed-off-by: Scott J. Goldman <scottjg@vmware.com>
^ permalink raw reply
* Re: [Pv-drivers] [PATCH] vmxnet3: Consistently disable irqs when taking adapter->cmd_lock
From: David Miller @ 2011-05-06 20:12 UTC (permalink / raw)
To: scottjg; +Cc: sbhatewara, roland, pv-drivers, netdev
In-Reply-To: <F78BCF638F95D74A99D036114107EDB5028EFAA0F8@EXCH-MBX-3.vmware.com>
From: Scott Goldman <scottjg@vmware.com>
Date: Fri, 6 May 2011 13:10:29 -0700
>>
>> On Fri, 6 May 2011, Roland Dreier wrote:
>>
>> > From: Roland Dreier <roland@purestorage.com>
>> >
>> > Using the vmxnet3 driver produces a lockdep warning because
>>
>> > Signed-off-by: Roland Dreier <roland@purestorage.com>
>>
>>
>> Roland, thanks for the analysis and the patch.
>>
>> Signed-off-by: Shreyas N Bhatewara <sbhatewara@vmware.com>
>
> Likewise, seems pretty sane to me. The command register operations are only control-path operations and disabling interrupts for the duration is probably not a big deal. Touching the cmd reg will result a VMEXIT, where the guest won't be processing interrupts anyway.
>
> Signed-off-by: Scott J. Goldman <scottjg@vmware.com>
Applied, thanks everyone.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox