* pci_function_reset from driver
From: Rajesh Borundia @ 2010-10-28 18:52 UTC (permalink / raw)
To: David Miller
Cc: netdev@vger.kernel.org, Ameen Rahman, Anirban Chakraborty,
Amit Salecha
Hi David,
For FLR supported device can I issue FLR on that
device from the driver ?
For example in kdump case it is recommended to reset
the device after loading of crash kernel.Can we issue FLR
from driver in such case(call some function like pci_reset_function).
Rajesh
^ permalink raw reply
* NULL pointer dereference at netxen_nic_probe+0x813/0x9a0
From: Bjorn Helgaas @ 2010-10-28 18:50 UTC (permalink / raw)
To: Amit Kumar Salecha; +Cc: netdev, linux-kernel
This is on current Linus upstream as of this morning (8128057)
on an HP DL785:
QLogic/NetXen Network Driver v4.0.74
netxen_nic 0000:07:00.0: PCI INT A -> GSI 30 (level, low) -> IRQ 30
netxen_nic 0000:07:00.0: setting latency timer to 64
netxen_nic 0000:07:00.0: 2MB memory map
netxen_nic 0000:07:00.0: loading firmware from flash
netxen_nic 0000:07:00.0: using 64-bit dma mask
kernel: Quad Gig LP Board S/N TI9ABK0266 Chip rev 0x42
netxen_nic 0000:07:00.0: firmware v4.0.520 [legacy]
netxen_nic 0000:07:00.0: irq 72 for MSI/MSI-X
netxen_nic 0000:07:00.0: irq 73 for MSI/MSI-X
netxen_nic 0000:07:00.0: irq 74 for MSI/MSI-X
netxen_nic 0000:07:00.0: irq 75 for MSI/MSI-X
netxen_nic 0000:07:00.0: using msi-x interrupts
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [<ffffffff8160afda>] netxen_nic_probe+0x813/0x9a0
PGD 0
Oops: 0002 [#1] SMP
last sysfs file:
CPU 0
Modules linked in:
Pid: 1650, comm: work_for_cpu Not tainted 2.6.36-07338-g8128057 #269 /ProLiant DL785 G5
RIP: 0010:[<ffffffff8160afda>] [<ffffffff8160afda>] netxen_nic_probe+0x813/0x9a0
RSP: 0018:ffff8806138abe30 EFLAGS: 00010246
RAX: 0000000000000010 RBX: ffff8806139126c0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff880613895616 RDI: ffff880613912000
RBP: ffff8806138abe90 R08: 0000000000000000 R09: ffff8806138abb80
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880613912000
R13: ffff8812174f7000 R14: ffff880613912000 R15: ffff8812174f7000
FS: 0000000000000000(0000) GS:ffff8800cfa00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000010 CR3: 0000000001c07000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process work_for_cpu (pid: 1650, threadinfo ffff8806138aa000, task ffff880616f12be0)
Stack:
ffff8812174f7090 0000000000000246 ffff8806138abe90 ffff8812174f7000
00008806138abfd8 0000000000000282 68cd0025b30068cc ffff880c17439d30
ffff8812174f7090 ffff8812174f7000 ffff8812174f7208 0000000000000000
Call Trace:
[<ffffffff81203696>] local_pci_probe+0x48/0x91
[<ffffffff81052bae>] ? do_work_for_cpu+0x0/0x26
[<ffffffff81052bc1>] do_work_for_cpu+0x13/0x26
[<ffffffff81052bae>] ? do_work_for_cpu+0x0/0x26
[<ffffffff81057a7b>] kthread+0x81/0x89
[<ffffffff81003854>] kernel_thread_helper+0x4/0x10
[<ffffffff810579fa>] ? kthread+0x0/0x89
[<ffffffff81003850>] ? kernel_thread_helper+0x0/0x10
Code: 00 eb 15 49 8d bf 90 00 00 00 48 c7 c6 1b 2e aa 81 31 c0 e8 c0 4e cd ff 4c 89 f7 e8 d6 bb ee ff 49 8b 96 00 03 00 00 48 8d 42 10 <f0> 80 4a 10 01 4c 89 f7 e8 a3 7e ed ff 85 c0 41 89 c4 74 2a 49
RIP [<ffffffff8160afda>] netxen_nic_probe+0x813/0x9a0
RSP <ffff8806138abe30>
CR2: 0000000000000010
---[ end trace 059c7071bbf8de1f ]---
^ permalink raw reply
* Re: [PATCH] ip_gre: fix percpu stats accounting
From: David Miller @ 2010-10-28 18:49 UTC (permalink / raw)
To: eric.dumazet; +Cc: xemul, netdev
In-Reply-To: <1288291679.2711.1.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 28 Oct 2010 20:47:59 +0200
> Le jeudi 28 octobre 2010 à 11:41 -0700, David Miller a écrit :
>> I am still able to revert this I think without screwing up
>> publicly visible history, so I will double check and do the
>> revert if I can.
>
> Cool, I'll provide a patch in a couple of minutes, when tested.
Thanks.
^ permalink raw reply
* Re: [Security] TIPC security issues
From: David Miller @ 2010-10-28 18:49 UTC (permalink / raw)
To: andy.grover
Cc: torvalds, jon.maloy, netdev, drosenberg, security, allan.stephens,
rds-devel
In-Reply-To: <4CC9C4B0.50404@oracle.com>
From: Andy Grover <andy.grover@oracle.com>
Date: Thu, 28 Oct 2010 11:45:04 -0700
> Yes that's right, it's to map a memory region that will be the target
> of an RDMA operation. I don't know why struct rds_iovec was used
> instead of struct iovec, but I think we're stuck, since it's part of
> our socket API.
>
> I'll send DaveM patches to fix those two immediately-identified
> problems today, and we'll take a good long look at the rest of the
> code for further issues.
FWIW, I would strongly suggest that you copy the iovecs into the
kernel before parsing them like sys_sendmsg() and sys_recvmsg() do in
net/socket.c as part of these fixes.
^ permalink raw reply
* Re: [Security] TIPC security issues
From: Andy Grover @ 2010-10-28 18:45 UTC (permalink / raw)
To: Linus Torvalds
Cc: David Miller, jon.maloy, netdev, drosenberg, security,
allan.stephens, RDS Devel
In-Reply-To: <AANLkTikdh-kqroCDA5EsjSMTLjViqe_U=hxM1L6U4Ppb@mail.gmail.com>
On 10/28/2010 08:32 AM, Linus Torvalds wrote:
> Heh. We apparently have _another_ iovec overflow in networking. This time rds.
>
> Reported by Thomas Pollet<thomas.pollet@gmail.com>: look at
> net/rds/rdma.c around line 490. It doesn't use the regular iovec code,
> instead it cooks its own, and has a few problems with overflow.
>
> It gathers the number of pages into an "unsigned int", and for each
> entry in its own rds_iovec it will check that the size is< UINT_MAX,
> and then generate the number of pages for that entry. With the whole
> "unaligned address adds one" logic, it means that each entry can get
> (UINT_MAX>> PAGE_SHIFT)+1 pages.
FWIW both the signed issue and not checking the iovec changed were
correct in 2.6.36, and only added in ff87e97.
> And how many entries can we have? Apparently that is capped to
> UINT_MAX too. So add all those up, and they can easily overflow the
> unsigned int page counter.
>
> So this time fixing verify_iovec() doesn't help, because rds just
> cooks its own, and this is using a totally different interface: it
> seems to hook into sendmsg, but it looks like it uses the ancillary
> data objects and passes in its own magical iovec rather than use any
> "normal" iovec thing. I don't know the code, I may be totally off.
Yes that's right, it's to map a memory region that will be the target of
an RDMA operation. I don't know why struct rds_iovec was used instead of
struct iovec, but I think we're stuck, since it's part of our socket API.
I'll send DaveM patches to fix those two immediately-identified problems
today, and we'll take a good long look at the rest of the code for
further issues.
Regards -- Andy
^ permalink raw reply
* Re: [PATCH] ip_gre: fix percpu stats accounting
From: Eric Dumazet @ 2010-10-28 18:47 UTC (permalink / raw)
To: David Miller; +Cc: xemul, netdev
In-Reply-To: <20101028.114102.112602886.davem@davemloft.net>
Le jeudi 28 octobre 2010 à 11:41 -0700, David Miller a écrit :
> I am still able to revert this I think without screwing up
> publicly visible history, so I will double check and do the
> revert if I can.
Cool, I'll provide a patch in a couple of minutes, when tested.
Thanks
^ permalink raw reply
* Re: [PATCH] ip_gre: fix percpu stats accounting
From: David Miller @ 2010-10-28 18:41 UTC (permalink / raw)
To: eric.dumazet; +Cc: xemul, netdev
In-Reply-To: <1288291138.2711.0.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 28 Oct 2010 20:38:58 +0200
> Le jeudi 28 octobre 2010 à 10:29 -0700, David Miller a écrit :
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Thu, 28 Oct 2010 18:33:54 +0200
>>
>> > Le jeudi 28 octobre 2010 à 20:07 +0400, Pavel Emelyanov a écrit :
>> >> commit e985aad7 (ip_gre: percpu stats accounting) forgot the fallback
>> >> tunnel case (gre0).
>> >>
>> >> This is the 4th part of the "foo: fix percpu stats accounting" series ;)
>> >>
>> >> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
>> >>
>> >
>> > Indeed, right you are ;)
>> >
>> > Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
>>
>> Applied.
>
>
> Hmm, actually patch is buggy, sorry...
I am still able to revert this I think without screwing up
publicly visible history, so I will double check and do the
revert if I can.
^ permalink raw reply
* Re: [PATCH] ip_gre: fix percpu stats accounting
From: Eric Dumazet @ 2010-10-28 18:38 UTC (permalink / raw)
To: David Miller; +Cc: xemul, netdev
In-Reply-To: <20101028.102917.246527148.davem@davemloft.net>
Le jeudi 28 octobre 2010 à 10:29 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 28 Oct 2010 18:33:54 +0200
>
> > Le jeudi 28 octobre 2010 à 20:07 +0400, Pavel Emelyanov a écrit :
> >> commit e985aad7 (ip_gre: percpu stats accounting) forgot the fallback
> >> tunnel case (gre0).
> >>
> >> This is the 4th part of the "foo: fix percpu stats accounting" series ;)
> >>
> >> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
> >>
> >
> > Indeed, right you are ;)
> >
> > Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
>
> Applied.
Hmm, actually patch is buggy, sorry...
^ permalink raw reply
* Re: [net-next] stmmac: enable/disable rx/tx in the core with a single write.
From: David Miller @ 2010-10-28 18:38 UTC (permalink / raw)
To: peppe.cavallaro; +Cc: netdev, armando.visconti
In-Reply-To: <1288069094-25365-1-git-send-email-peppe.cavallaro@st.com>
From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
Date: Tue, 26 Oct 2010 06:58:14 +0200
> From: avisconti <armando.visconti@st.com>
>
> This patch enables and disables the rx and tx bits in the MAC control reg
> by using a single write operation.
> This also solves a possible problem (spotted on SPEAr platforms) at 10Mbps
> where two consecutive writes to a MAC control register can take more than
> 4 phy_clk cycles.
>
> Signed-off-by: Armando Visconti <armando.visconti@st.com>
> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH] net: Limit socket I/O iovec total length to INT_MAX.
From: David Miller @ 2010-10-28 18:37 UTC (permalink / raw)
To: torvalds; +Cc: netdev, drosenberg, jon.maloy, allan.stephens
In-Reply-To: <AANLkTinq5iYU3A41vLHLLWVpoS-A4mB-RYrrWYCGKk+-@mail.gmail.com>
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Thu, 28 Oct 2010 11:33:56 -0700
> On Thu, Oct 28, 2010 at 11:22 AM, David Miller <davem@davemloft.net> wrote:
>>
>> - int tot_len = 0;
>> + size_t tot_len = 0;
>
> I would actually keep "tot_len" as an "int".
...
>> +int verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode)
>> {
>> int size, ct;
>> - long err;
>> + size_t err;
>
> Same thing here. Making "err" be an "int" is actually the right thing
> to do, because then it matches the return type (iow, if it was any
> other type, there would be an implicit cast, and if it didn't fit in
> "int", that would be a bug anyway).
Yep, agreed on all counts, I'll make those changes.
^ permalink raw reply
* Re: [PATCH] net: Limit socket I/O iovec total length to INT_MAX.
From: Linus Torvalds @ 2010-10-28 18:33 UTC (permalink / raw)
To: David Miller; +Cc: netdev, drosenberg, jon.maloy, allan.stephens
In-Reply-To: <20101028.112231.232747062.davem@davemloft.net>
On Thu, Oct 28, 2010 at 11:22 AM, David Miller <davem@davemloft.net> wrote:
>
> - int tot_len = 0;
> + size_t tot_len = 0;
I would actually keep "tot_len" as an "int".
The whole point of this:
> + if (len > INT_MAX - tot_len)
> + len = INT_MAX - tot_len;
> +
> tot_len += len;
Is that "tot_len" can _never_ become larger than INT_MAX, because we
never add a "len" that would make it bigger than that.
So "len" itself should be the correct unsigned size_t (so that the
"len > INT_MAX - tot_len" thing is done as an unsigned comparison),
but "tot_len" itself is very much designed to fit in "int".
> +int verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode)
> {
> int size, ct;
> - long err;
> + size_t err;
Same thing here. Making "err" be an "int" is actually the right thing
to do, because then it matches the return type (iow, if it was any
other type, there would be an implicit cast, and if it didn't fit in
"int", that would be a bug anyway).
Linus
^ permalink raw reply
* Re: [PATCH] net: atarilance - flags should be unsigned long
From: David Miller @ 2010-10-28 18:35 UTC (permalink / raw)
To: geert; +Cc: akpm, netdev, linux-kernel, linux-m68k
In-Reply-To: <alpine.DEB.2.00.1010282030530.29788@ayla.of.borg>
From: Geert Uytterhoeven <geert@linux-m68k.org>
Date: Thu, 28 Oct 2010 20:31:53 +0200 (CEST)
> drivers/net/atarilance.c: In function ‘addr_accessible’:
> drivers/net/atarilance.c:413: warning: comparison of distinct pointer types lacks a cast
> drivers/net/atarilance.c:450: warning: comparison of distinct pointer types lacks a cast
>
> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH 1/1] netxen: fix kdump
From: David Miller @ 2010-10-28 18:34 UTC (permalink / raw)
To: amit.salecha; +Cc: netdev, ameen.rahman, anirban.chakraborty, rajesh.borundia
In-Reply-To: <1288169510-4655-1-git-send-email-amit.salecha@qlogic.com>
From: Amit Kumar Salecha <amit.salecha@qlogic.com>
Date: Wed, 27 Oct 2010 01:51:50 -0700
> From: Rajesh Borundia <rajesh.borundia@qlogic.com>
>
> Reset the whole hw instead of freeing hw resources
> consumed by each pci function.
>
> Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
> Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH] pktgen: Limit how much data we copy onto the stack.
From: David Miller @ 2010-10-28 18:32 UTC (permalink / raw)
To: nelhage; +Cc: eric.dumazet, robert.olsson, andy.shevchenko, netdev, eugene
In-Reply-To: <1288279242-22153-1-git-send-email-nelhage@ksplice.com>
From: Nelson Elhage <nelhage@ksplice.com>
Date: Thu, 28 Oct 2010 11:20:42 -0400
> A program that accidentally writes too much data to the pktgen file can overflow
> the kernel stack and oops the machine. This is only triggerable by root, so
> there's no security issue, but it's still an unfortunate bug.
>
> printk() won't print more than 1024 bytes in a single call, anyways, so let's
> just never copy more than that much data. We're on a fairly shallow stack, so
> that should be safe even with CONFIG_4KSTACKS.
>
> Signed-off-by: Nelson Elhage <nelhage@ksplice.com>
Applied, thanks a lot.
^ permalink raw reply
* [PATCH] net: atarilance - flags should be unsigned long
From: Geert Uytterhoeven @ 2010-10-28 18:31 UTC (permalink / raw)
To: David S. Miller, Andrew Morton
Cc: netdev, Linux Kernel Development, Linux/m68k
drivers/net/atarilance.c: In function ‘addr_accessible’:
drivers/net/atarilance.c:413: warning: comparison of distinct pointer types lacks a cast
drivers/net/atarilance.c:450: warning: comparison of distinct pointer types lacks a cast
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
---
drivers/net/atarilance.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/net/atarilance.c b/drivers/net/atarilance.c
index 3134e53..8cb27cb 100644
--- a/drivers/net/atarilance.c
+++ b/drivers/net/atarilance.c
@@ -407,7 +407,7 @@ static noinline int __init addr_accessible(volatile void *regp, int wordflag,
int writeflag)
{
int ret;
- long flags;
+ unsigned long flags;
long *vbr, save_berr;
local_irq_save(flags);
--
1.7.0.4
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply related
* [PATCH] net: Limit socket I/O iovec total length to INT_MAX.
From: David Miller @ 2010-10-28 18:22 UTC (permalink / raw)
To: torvalds; +Cc: netdev, drosenberg, jon.maloy, allan.stephens
This helps protect us from overflow issues down in the
individual protocol sendmsg/recvmsg handlers. Once
we hit INT_MAX we truncate out the rest of the iovec
by setting the iov_len members to zero.
This works because:
1) For SOCK_STREAM and SOCK_SEQPACKET sockets, partial
writes are allowed and the application will just continue
with another write to send the rest of the data.
2) For datagram oriented sockets, where there must be a
one-to-one correspondance between write() calls and
packets on the wire, INT_MAX is going to be far larger
than the packet size limit the protocol is going to
check for and signal with -EMSGSIZE.
Based upon a patch by Linus Torvalds.
Signed-off-by: David S. Miller <davem@davemloft.net>
---
Ok, this is the patch I am testing right now. It ought to
plug the TIPC holes wrt. handling iovecs given by the
user.
I'll look at the recently discovered RDS crap next :-/
include/linux/socket.h | 2 +-
net/compat.c | 12 +++++++-----
net/core/iovec.c | 19 +++++++++----------
3 files changed, 17 insertions(+), 16 deletions(-)
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 5146b50..86b652f 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -322,7 +322,7 @@ extern int csum_partial_copy_fromiovecend(unsigned char *kdata,
int offset,
unsigned int len, __wsum *csump);
-extern long verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode);
+extern int verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode);
extern int memcpy_toiovec(struct iovec *v, unsigned char *kdata, int len);
extern int memcpy_toiovecend(const struct iovec *v, unsigned char *kdata,
int offset, int len);
diff --git a/net/compat.c b/net/compat.c
index 63d260e..71bfd8e 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -34,17 +34,19 @@ static inline int iov_from_user_compat_to_kern(struct iovec *kiov,
struct compat_iovec __user *uiov32,
int niov)
{
- int tot_len = 0;
+ size_t tot_len = 0;
while (niov > 0) {
compat_uptr_t buf;
compat_size_t len;
if (get_user(len, &uiov32->iov_len) ||
- get_user(buf, &uiov32->iov_base)) {
- tot_len = -EFAULT;
- break;
- }
+ get_user(buf, &uiov32->iov_base))
+ return -EFAULT;
+
+ if (len > INT_MAX - tot_len)
+ len = INT_MAX - tot_len;
+
tot_len += len;
kiov->iov_base = compat_ptr(buf);
kiov->iov_len = (__kernel_size_t) len;
diff --git a/net/core/iovec.c b/net/core/iovec.c
index 72aceb1..e7f5b29 100644
--- a/net/core/iovec.c
+++ b/net/core/iovec.c
@@ -35,10 +35,10 @@
* in any case.
*/
-long verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode)
+int verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode)
{
int size, ct;
- long err;
+ size_t err;
if (m->msg_namelen) {
if (mode == VERIFY_READ) {
@@ -62,14 +62,13 @@ long verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address,
err = 0;
for (ct = 0; ct < m->msg_iovlen; ct++) {
- err += iov[ct].iov_len;
- /*
- * Goal is not to verify user data, but to prevent returning
- * negative value, which is interpreted as errno.
- * Overflow is still possible, but it is harmless.
- */
- if (err < 0)
- return -EMSGSIZE;
+ size_t len = iov[ct].iov_len;
+
+ if (len > INT_MAX - err) {
+ len = INT_MAX - err;
+ iov[ct].iov_len = len;
+ }
+ err += len;
}
return err;
--
1.7.3.2
^ permalink raw reply related
* Re: [PATCH] net: Allocate RX and TX queues in alloc_netdev_mq
From: Tom Herbert @ 2010-10-28 18:11 UTC (permalink / raw)
To: David Miller; +Cc: netdev
In-Reply-To: <20101028.111001.70188960.davem@davemloft.net>
On Thu, Oct 28, 2010 at 11:10 AM, David Miller <davem@davemloft.net> wrote:
>
> Did you mean to actually say something other than quoting everything
> said so far? :-)
>
You're response was so nicely worded that I had to repeat it :-)
Anyway, thanks for the clarification!
>
^ permalink raw reply
* Re: [PATCH] net: Allocate RX and TX queues in alloc_netdev_mq
From: David Miller @ 2010-10-28 18:10 UTC (permalink / raw)
To: therbert; +Cc: netdev
In-Reply-To: <AANLkTik0rO5wcMtvoys__p07deATp0wUjDx=24oLYcfS@mail.gmail.com>
Did you mean to actually say something other than quoting everything
said so far? :-)
^ permalink raw reply
* Re: [PATCH] net: Allocate RX and TX queues in alloc_netdev_mq
From: Tom Herbert @ 2010-10-28 18:08 UTC (permalink / raw)
To: David Miller; +Cc: netdev
In-Reply-To: <20101028.104942.179929434.davem@davemloft.net>
On Thu, Oct 28, 2010 at 10:49 AM, David Miller <davem@davemloft.net> wrote:
> From: Tom Herbert <therbert@google.com>
> Date: Thu, 28 Oct 2010 10:45:45 -0700 (PDT)
>
>> Both TX and RX queue allocations in the netdev structure had been moved
>> to register_device with the idea that the number of queues could be
>> changed between device allocation and registration. It was determined
>> that changing the num_queues after the call to alloc_netdev_mq was not
>> a good idea, so that was abandoned. Also, some drivers call
>> netif_queue_start without registering device, which causes panic
>> when dev->_tx queue structure is accessed. This patch moves queue
>> allocations back to alloc_netdev_mq to fix these issues.
>>
>> Signed-off-by: Tom Herbert <therbert@google.com>
>
> Changing the TX queue state with a netif_stop_queue() call or
> similar was always completely pointless and frankly a bug.
>
> So that should not influence the motivation behind this change
> at all.
>
> In fact I _like_ that it crashes now so that we are forced
> to fix these cases.
>
> Really, the queue state is absolutely immaterial during
> device allocation and registration. It's state is
> %100 "don't care" until ->open() is invoked.
>
> So any code that touches the queue state at these earlier points in
> time is completely extraneous if not broken.
>
^ permalink raw reply
* Re: [PATCH] net: Allocate RX and TX queues in alloc_netdev_mq
From: David Miller @ 2010-10-28 17:49 UTC (permalink / raw)
To: therbert; +Cc: netdev
In-Reply-To: <alpine.DEB.1.00.1010281036350.22486@pokey.mtv.corp.google.com>
From: Tom Herbert <therbert@google.com>
Date: Thu, 28 Oct 2010 10:45:45 -0700 (PDT)
> Both TX and RX queue allocations in the netdev structure had been moved
> to register_device with the idea that the number of queues could be
> changed between device allocation and registration. It was determined
> that changing the num_queues after the call to alloc_netdev_mq was not
> a good idea, so that was abandoned. Also, some drivers call
> netif_queue_start without registering device, which causes panic
> when dev->_tx queue structure is accessed. This patch moves queue
> allocations back to alloc_netdev_mq to fix these issues.
>
> Signed-off-by: Tom Herbert <therbert@google.com>
Changing the TX queue state with a netif_stop_queue() call or
similar was always completely pointless and frankly a bug.
So that should not influence the motivation behind this change
at all.
In fact I _like_ that it crashes now so that we are forced
to fix these cases.
Really, the queue state is absolutely immaterial during
device allocation and registration. It's state is
%100 "don't care" until ->open() is invoked.
So any code that touches the queue state at these earlier points in
time is completely extraneous if not broken.
^ permalink raw reply
* [PATCH] net: Allocate RX and TX queues in alloc_netdev_mq
From: Tom Herbert @ 2010-10-28 17:45 UTC (permalink / raw)
To: davem, netdev
Both TX and RX queue allocations in the netdev structure had been moved
to register_device with the idea that the number of queues could be
changed between device allocation and registration. It was determined
that changing the num_queues after the call to alloc_netdev_mq was not
a good idea, so that was abandoned. Also, some drivers call
netif_queue_start without registering device, which causes panic
when dev->_tx queue structure is accessed. This patch moves queue
allocations back to alloc_netdev_mq to fix these issues.
Signed-off-by: Tom Herbert <therbert@google.com>
---
net/core/dev.c | 38 ++++++++++++++++++--------------------
1 files changed, 18 insertions(+), 20 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index b2269ac..1f729d6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5009,10 +5009,10 @@ void netif_stacked_transfer_operstate(const struct net_device *rootdev,
}
EXPORT_SYMBOL(netif_stacked_transfer_operstate);
-static int netif_alloc_rx_queues(struct net_device *dev)
+static int netif_alloc_rx_queues(struct net_device *dev, int count)
{
#ifdef CONFIG_RPS
- unsigned int i, count = dev->num_rx_queues;
+ int i;
struct netdev_rx_queue *rx;
BUG_ON(count < 1);
@@ -5030,13 +5030,15 @@ static int netif_alloc_rx_queues(struct net_device *dev)
*/
for (i = 0; i < count; i++)
rx[i].first = rx;
+
+ dev->num_rx_queues = count;
+ dev->real_num_rx_queues = count;
#endif
return 0;
}
-static int netif_alloc_netdev_queues(struct net_device *dev)
+static int netif_alloc_netdev_queues(struct net_device *dev, int count)
{
- unsigned int count = dev->num_tx_queues;
struct netdev_queue *tx;
BUG_ON(count < 1);
@@ -5048,6 +5050,10 @@ static int netif_alloc_netdev_queues(struct net_device *dev)
return -ENOMEM;
}
dev->_tx = tx;
+
+ dev->num_tx_queues = count;
+ dev->real_num_tx_queues = count;
+
return 0;
}
@@ -5105,14 +5111,6 @@ int register_netdevice(struct net_device *dev)
dev->iflink = -1;
- ret = netif_alloc_rx_queues(dev);
- if (ret)
- goto out;
-
- ret = netif_alloc_netdev_queues(dev);
- if (ret)
- goto out;
-
netdev_init_queues(dev);
/* Init, if this function is available */
@@ -5558,6 +5556,12 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
dev = PTR_ALIGN(p, NETDEV_ALIGN);
dev->padded = (char *)dev - (char *)p;
+ if (netif_alloc_netdev_queues(dev, queue_count))
+ goto free_p;
+
+ if (netif_alloc_rx_queues(dev, queue_count))
+ goto free_p;
+
dev->pcpu_refcnt = alloc_percpu(int);
if (!dev->pcpu_refcnt)
goto free_p;
@@ -5570,14 +5574,6 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
dev_net_set(dev, &init_net);
- dev->num_tx_queues = queue_count;
- dev->real_num_tx_queues = queue_count;
-
-#ifdef CONFIG_RPS
- dev->num_rx_queues = queue_count;
- dev->real_num_rx_queues = queue_count;
-#endif
-
dev->gso_max_size = GSO_MAX_SIZE;
INIT_LIST_HEAD(&dev->ethtool_ntuple_list.list);
@@ -5593,6 +5589,8 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
free_pcpu:
free_percpu(dev->pcpu_refcnt);
free_p:
+ kfree(dev->_tx);
+ kfree(dev->_rx);
kfree(p);
return NULL;
}
--
1.7.3.1
^ permalink raw reply related
* Re: [PATCH] USB: gadget: fix ethernet gadget crash in gether_setup
From: David Miller @ 2010-10-28 17:32 UTC (permalink / raw)
To: mad_soft; +Cc: linux-usb, dbrownell, gregkh, netdev, linux-kernel, therbert
In-Reply-To: <1288253909-12084-1-git-send-email-mad_soft@inbox.ru>
From: Dmitry Artamonow <mad_soft@inbox.ru>
Date: Thu, 28 Oct 2010 12:18:29 +0400
> Crash is triggered by commit e6484930d7 ("net: allocate tx queues in
> register_netdevice"), which moved tx netqueue creation into register_netdev.
> So now calling netif_stop_queue() before register_netdev causes an oops.
> Move netif_stop_queue() after net device registration to fix crash.
>
> Signed-off-by: Dmitry Artamonow <mad_soft@inbox.ru>
Applied, thank you.
^ permalink raw reply
* Re: [PATCH] Fix missing dependency of PCH_GBE driver
From: Steven Rostedt @ 2010-10-28 17:31 UTC (permalink / raw)
To: David Miller; +Cc: linux-kernel, netdev, error27, masa-korg, akpm
In-Reply-To: <20101028.101618.226776102.davem@davemloft.net>
On Thu, 2010-10-28 at 10:16 -0700, David Miller wrote:
> The canonical way to handle this is to use 'select'. Thus,
> I've committed the following to fix this.
Yeah, I was thinking that too after I posted it.
-- Steve
^ permalink raw reply
* Re: [PATCH] ip_gre: fix percpu stats accounting
From: David Miller @ 2010-10-28 17:29 UTC (permalink / raw)
To: eric.dumazet; +Cc: xemul, netdev
In-Reply-To: <1288283634.11523.20.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 28 Oct 2010 18:33:54 +0200
> Le jeudi 28 octobre 2010 à 20:07 +0400, Pavel Emelyanov a écrit :
>> commit e985aad7 (ip_gre: percpu stats accounting) forgot the fallback
>> tunnel case (gre0).
>>
>> This is the 4th part of the "foo: fix percpu stats accounting" series ;)
>>
>> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
>>
>
> Indeed, right you are ;)
>
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH] fib: Fix fib zone and its hash leak on namespace stop
From: David Miller @ 2010-10-28 17:28 UTC (permalink / raw)
To: xemul; +Cc: netdev
In-Reply-To: <4CC965EB.60600@parallels.com>
From: Pavel Emelyanov <xemul@parallels.com>
Date: Thu, 28 Oct 2010 16:00:43 +0400
> When we stop a namespace we flush the table and free one, but the
> added fn_zone-s (and their hashes if grown) are leaked. Need to free.
> Tries releases all its stuff in the flushing code.
>
> Shame on us - this bug exists since the very first make-fib-per-net
> patches in 2.6.27 :(
>
> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Applied.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox