* multi-machine simultaneous kernel panic in tcp_transmit_kcb
From: Doug Hughes @ 2010-10-28 1:04 UTC (permalink / raw)
To: netdev
3 machines within 1 minute of each other (odd, by itself, but not the
root of the question).
2 of this:
2.6.18-164.15.1.el5 #1 SMP Wed Mar 17 11:30:06 EDT 2010 x86_64 x86_64
x86_64 GNU/Linux
(I have a screen shot on the kvm)
all Cent 5.4
1 Xen instances with 2.6.18-128.1.14.el5xen #1 SMP Wed Jun 17 07:10:16
EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
a slightly older kernel but crashed within one minute of the other two.
Since it's a xen, I have a text traceback:
Pid: 0, comm: swapper Not tainted 2.6.18-128.1.14.el5xen #1
RIP: e030:[<ffffffff8040e077>] [<ffffffff8040e077>] pskb_copy+0x133/0x1b1
RSP: e02b:ffffffff8066ade0 EFLAGS: 00010282
RAX: ffff8800325fa120 RBX: ffff8800434f5780 RCX: ffff88006d311930
RDX: 656363612f647074 RSI: ffff8800325fa130 RDI: 0000000000000002
RBP: ffff8800549aa680 R08: 7ffffffffffffffe R09: 0000000000000000
R10: ffff8800434f5780 R11: 00000000000000c8 R12: 0000000000000220
R13: ffff8800549aa680 R14: 0000000000000000 R15: ffffffffff578000
FS: 00002b84514af260(0000) GS:ffffffff805ba000(0000) knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000
Process swapper (pid: 0, threadinfo ffffffff8062a000, task ffffffff804e0a80)
Stack: ffffffff802886d9 ffff88006ad3d280 0000000000000001 ffffffff80222485
ffff880025665380 000000017d0f80ab 0000000000000001 ffff88006ad3d280
ffff8800549aa680 00000000ffffff8f
Call Trace:
<IRQ> [<ffffffff802886d9>] rebalance_tick+0x18b/0x3d4
[<ffffffff80222485>] tcp_transmit_skb+0x73/0x667
[<ffffffff8043903a>] tcp_retransmit_skb+0x53d/0x638
[<ffffffff8043a569>] tcp_write_timer+0x0/0x68e
[<ffffffff8043a9d6>] tcp_write_timer+0x46d/0x68e
[<ffffffff80291f8b>] run_timer_softirq+0x13f/0x1c6
[<ffffffff802130d6>] __do_softirq+0x8d/0x13b
[<ffffffff80260da4>] call_softirq+0x1c/0x278
[<ffffffff8026e0be>] do_softirq+0x31/0x98
[<ffffffff8026df39>] do_IRQ+0xec/0xf5
[<ffffffff803a7b94>] evtchn_do_upcall+0x13b/0x1fb
[<ffffffff802608d6>] do_hypervisor_callback+0x1e/0x2c
<EOI> [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000
[<ffffffff802063aa>] hypercall_page+0x3aa/0x1000
[<ffffffff8026f511>] raw_safe_halt+0x84/0xa8
[<ffffffff8026ca52>] xen_idle+0x38/0x4a
[<ffffffff8024b0d8>] cpu_idle+0x97/0xba
[<ffffffff80634b09>] start_kernel+0x21f/0x224
[<ffffffff806341e5>] _sinittext+0x1e5/0x1eb
Code: 48 8b 02 25 00 40 02 00 48 3d 00 40 02 00 75 04 48 8b 52 10
RIP [<ffffffff8040e077>] pskb_copy+0x133/0x1b1
RSP <ffffffff8066ade0>
<0>Kernel panic - not syncing: Fatal exception
---
The first 4 lines of the trace on the xen and the non-xen are the same
except for the addresses.
In fact, they are the same up until the 9th line where they start to
diverge a little bit.
The last thing in the kern log before the crash on one was an nfs server
not responding, but those happen sporadically and often enough that I
don't suspect it's related.
Given that its looks, seemed like an appropriate question for netdev
(following a failed google search)
^ permalink raw reply
* cxgb3: kernel access of bad area with v2.6.36-6794-g12ba8d1
From: Nishanth Aravamudan @ 2010-10-28 1:54 UTC (permalink / raw)
To: Divy Le Ray; +Cc: sonnyrao, netdev, linux-kernel
Hi,
I'm seeing the following trace w/ current git on a machine in our lab:
Chelsio T3 Network Driver - version 1.1.4-ko
cxgb3 0003:01:00.0: enabling device (0140 -> 0142)
Unable to handle kernel paging request for data at address 0x00000010
Faulting instruction address: 0xd000000008473ae8
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=32 NUMA pSeries
last sysfs file: /sys/devices/virtual/block/dm-0/dev
Modules linked in: cxgb3(+) mdio ehea ib_ehca ib_core ext4 jbd2 mbcache sd_mod crc_t10dif ipr dm_mod [last unloaded: scsi_wait_scan]
NIP: d000000008473ae8 LR: d000000008473ac4 CTR: c0000000004398a0
REGS: c0000007a157f190 TRAP: 0300 Not tainted (2.6.36)
MSR: 8000000000009032 <EE,ME,IR,DR> CR: 24424444 XER: 00000000
DAR: 0000000000000010, DSISR: 0000000040000000
TASK = c0000007a3755290[741] 'modprobe' THREAD: c0000007a157c000 CPU: 24
GPR00: 0000000000000000 c0000007a157f410 d000000008486978 c0000007a526c000
GPR04: c0000000006d25dd c0000007a526c005 c0000007a526c29e 0000000000000002
GPR08: 0000000000000004 0000000000000010 c0000007a526c0a0 0000000000000000
GPR12: d000000008474aa8 c00000000eed3c00 d00000000847aeb8 0000000000000001
GPR16: 0000000000001000 0000000000000000 d000000008477aa8 00003c047ef7e000
GPR20: c0000007a8b7d280 c0000007a8b7d310 d00000000847d1c0 d00000000847d1d8
GPR24: 0000000000000003 00003c047ef7efff 0000000000000001 c0000007a3c1c000
GPR28: 0000000000000000 c0000007a526c000 d000000008484210 c0000007a3c1c000
NIP [d000000008473ae8] .init_one+0x510/0xb7c [cxgb3]
LR [d000000008473ac4] .init_one+0x4ec/0xb7c [cxgb3]
Call Trace:
[c0000007a157f410] [d000000008473ac4] .init_one+0x4ec/0xb7c [cxgb3] (unreliable)
[c0000007a157f560] [c0000000002e40bc] .local_pci_probe+0x7c/0x100
[c0000007a157f5f0] [c0000000002e5018] .pci_device_probe+0x148/0x150
[c0000007a157f6a0] [c00000000034df68] .driver_probe_device+0x128/0x330
[c0000007a157f750] [c00000000034e27c] .__driver_attach+0x10c/0x110
[c0000007a157f7e0] [c00000000034d15c] .bus_for_each_dev+0x9c/0xf0
[c0000007a157f890] [c00000000034dbc8] .driver_attach+0x28/0x40
[c0000007a157f910] [c00000000034c648] .bus_add_driver+0x218/0x3d0
[c0000007a157f9c0] [c00000000034e718] .driver_register+0x98/0x1d0
[c0000007a157fa60] [c0000000002e5354] .__pci_register_driver+0x64/0x140
[c0000007a157fb00] [d000000008474278] .cxgb3_init_module+0x2c/0x44 [cxgb3]
[c0000007a157fb80] [c000000000009754] .do_one_initcall+0x64/0x1e0
[c0000007a157fc40] [c0000000000d28b8] .SyS_init_module+0x1b8/0x1790
[c0000007a157fe30] [c000000000008564] syscall_exit+0x0/0x40
Instruction dump:
9b890018 9b090019 48000fe9 e8410028 801d0308 2f800000 419e003c 39600000
e93d0300 796045e4 7d290214 39290010 <7c0048a8> 7c00d378 7c0049ad 40a2fff4
---[ end trace 2a530df8c4ad3d70 ]---
udevd-work[600]: '/sbin/modprobe -b pci:v00001425d00000030sv00001014sd0000038Cbc02sc00i00' unexpected exit with status 0x000b
I did an objdump -ldr of cxgb3.ko and:
4c0: 48 00 00 01 bl 4c0 <.init_one+0x4c0>
4c0: R_PPC64_REL24 .alloc_etherdev_mq
4c4: 60 00 00 00 nop
4c8: 7c 7d 1b 79 mr. r29,r3
4cc: 41 82 03 28 beq- 7f4 <.init_one+0x7f4>
4d0: 39 3d 07 00 addi r9,r29,1792
4d4: fa bd 03 f8 std r21,1016(r29)
4d8: fb bb 32 08 std r29,12808(r27)
4dc: fb fd 07 00 std r31,1792(r29)
4e0: 9b 89 00 18 stb r28,24(r9)
4e4: 9b 09 00 19 stb r24,25(r9)
4e8: 48 00 00 01 bl 4e8 <.init_one+0x4e8>
4e8: R_PPC64_REL24 .netif_carrier_off
4ec: 60 00 00 00 nop
4f0: 80 1d 03 08 lwz r0,776(r29)
4f4: 2f 80 00 00 cmpwi cr7,r0,0
4f8: 41 9e 00 3c beq- cr7,534 <.init_one+0x534>
4fc: 39 60 00 00 li r11,0
500: e9 3d 03 00 ld r9,768(r29)
504: 79 60 45 e4 rldicr r0,r11,8,55
508: 7d 29 02 14 add r9,r9,r0
50c: 39 29 00 10 addi r9,r9,16
510: 7c 00 48 a8 ldarx r0,0,r9
514: 7c 00 d3 78 or r0,r0,r26
518: 7c 00 49 ad stdcx. r0,0,r9
51c: 40 a2 ff f4 bne- 510 <.init_one+0x510>
So I'm guessing it's somewhere in here:
for (i = 0; i < ai->nports0 + ai->nports1; ++i) {
struct net_device *netdev;
netdev = alloc_etherdev_mq(sizeof(struct port_info), SGE_QSETS);
if (!netdev) {
err = -ENOMEM;
goto out_free_dev;
}
SET_NETDEV_DEV(netdev, &pdev->dev);
adapter->port[i] = netdev;
pi = netdev_priv(netdev);
pi->adapter = adapter;
pi->rx_offload = T3_RX_CSUM | T3_LRO;
pi->port_id = i;
netif_carrier_off(netdev);
netif_tx_stop_all_queues(netdev);
netdev->irq = pdev->irq;
netdev->mem_start = mmio_start;
netdev->mem_end = mmio_start + mmio_len - 1;
netdev->features |= NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO;
netdev->features |= NETIF_F_GRO;
if (pci_using_dac)
netdev->features |= NETIF_F_HIGHDMA;
netdev->features |= NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX;
netdev->netdev_ops = &cxgb_netdev_ops;
SET_ETHTOOL_OPS(netdev, &cxgb_ethtool_ops);
}
Well, presuming the trace is mostly accurate? I'm not sure what else is
needed to determine the problem further. I'm building 2.6.36 as I write
this. But it doesn't seem like this code has changed much and I had a
working kernel around 2.6.36-rc7...
Let me know what else I can do to help debug.
Thanks,
Nish
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
^ permalink raw reply
* Re: [PATCH] igb: Fix unused variable warning.
From: Jeff Kirsher @ 2010-10-28 2:13 UTC (permalink / raw)
To: Jesse Gross; +Cc: David Miller, netdev
In-Reply-To: <1288140963-20537-1-git-send-email-jesse@nicira.com>
On Tue, Oct 26, 2010 at 17:56, Jesse Gross <jesse@nicira.com> wrote:
> Commit eab6d18d "vlan: Don't check for vlan group before
> vlan_tx_tag_present" removed the need for the adapter variable
> in igb_xmit_frame_ring_adv(). This removes the variable as well
> to avoid the compiler warning.
>
> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
> Signed-off-by: Jesse Gross <jesse@nicira.com>
> ---
> drivers/net/igb/igb_main.c | 1 -
> 1 files changed, 0 insertions(+), 1 deletions(-)
>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
^ permalink raw reply
* [PATCH 1/1] net/unix: Allow Unix sockets to be treated like normal files
From: x @ 2010-10-28 2:24 UTC (permalink / raw)
To: netdev; +Cc: Dave Miller, Jeff Hansen
From: Jeff Hansen <x@jeffhansen.com>
Resent. Is there anything else I need to do for this patch to get reviewed
and/or merged? Any comments by anyone?
This allows Unix sockets to be opened, written, read, and closed, like
normal files. This can be especially handy from, for example, a shell
script that wants to send a short message to a Unix socket, but doesn't
want to and/or cannot create the socket itself.
This will try to open the Unix socket first in SOCK_DGRAM mode, then
SOCK_STREAM mode if that fails.
Signed-off-by: Jeff Hansen <x@jeffhansen.com>
---
net/unix/Kconfig | 10 +++++
net/unix/af_unix.c | 113 ++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 123 insertions(+), 0 deletions(-)
diff --git a/net/unix/Kconfig b/net/unix/Kconfig
index 5a69733..68df4f1 100644
--- a/net/unix/Kconfig
+++ b/net/unix/Kconfig
@@ -19,3 +19,13 @@ config UNIX
Say Y unless you know what you are doing.
+config UNIX_FOPS
+ boolean "Allow Unix sockets to be treated like normal files"
+ depends on UNIX
+ ---help---
+ If you say Y here, Unix sockets may be opened, written, read, and
+ closed, like normal files. This is handy for sending short commands
+ to Unix sockets (i.e. from shell scripts), without having to create
+ a Unix socket.
+
+ Say Y unless you know what you are doing.
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 0ebc777..b5a6655 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -798,6 +798,114 @@ fail:
return NULL;
}
+#ifdef CONFIG_UNIX_FOPS
+static int unix_open(struct inode *inode, struct file *filp)
+{
+ int err;
+ struct socket *sock = NULL;
+ struct dentry *dentry = filp->f_dentry;
+ struct sockaddr_un sunaddr = { 0 };
+ char *p;
+
+ if (!filp)
+ return -ENXIO;
+ dentry = filp->f_dentry;
+
+ if (!dentry || !dentry->d_parent)
+ return -ENXIO;
+
+ if (filp->private_data)
+ return -EBUSY;
+
+ sunaddr.sun_family = AF_UNIX;
+ p = d_path(&filp->f_path, sunaddr.sun_path, sizeof(sunaddr.sun_path));
+ if (IS_ERR(p))
+ return PTR_ERR(p);
+ memmove(sunaddr.sun_path, p, p[sizeof(sunaddr.sun_path) - 1] ?
+ sizeof(sunaddr.sun_path) : strlen(p));
+
+ err = sock_create(PF_UNIX, SOCK_DGRAM, 0, &sock);
+ if (err)
+ return err;
+
+ err = unix_dgram_connect(sock, (struct sockaddr *)&sunaddr,
+ sizeof(sunaddr), 0);
+ if (err) {
+ sock_release(sock);
+
+ err = sock_create(PF_UNIX, SOCK_STREAM, 0, &sock);
+ if (err)
+ return err;
+
+ err = unix_stream_connect(sock, (struct sockaddr *)&sunaddr,
+ sizeof(sunaddr), 0);
+
+ if (err)
+ return err;
+ }
+ filp->private_data = sock;
+
+ return err;
+}
+
+static int unix_frelease(struct inode *inode, struct file *filp)
+{
+ if (!filp->private_data)
+ return -ENXIO;
+
+ sock_release(filp->private_data);
+ filp->private_data = NULL;
+ return 0;
+}
+
+static ssize_t unix_readwrite(struct file *filp, void *buf,
+ size_t _len, loff_t *ppos, int do_write)
+{
+ struct socket *sock = filp->private_data;
+ int len = (int)_len, err;
+ struct kvec iov = {
+ .iov_base = buf,
+ .iov_len = len,
+ };
+ struct msghdr msg = {
+ /* NB: struct iovec and kvec are equal */
+ .msg_iov = (struct iovec *)&iov,
+ .msg_iovlen = 1,
+ };
+
+ if (!sock)
+ return -ENXIO;
+ if (_len > 0xffffffffLL)
+ return -E2BIG;
+
+ err = do_write ? sock_sendmsg(sock, &msg, len) :
+ sock_recvmsg(sock, &msg, len, 0);
+ if (err > 0 && ppos)
+ *ppos += err;
+
+ return err;
+}
+
+static ssize_t unix_write(struct file *filp, const char __user *buf,
+ size_t _len, loff_t *ppos)
+{
+ return unix_readwrite(filp, (void *)buf, _len, ppos, 0);
+}
+
+static ssize_t unix_read(struct file *filp, const char __user *buf,
+ size_t _len, loff_t *ppos)
+{
+ return unix_readwrite(filp, (void *)buf, _len, ppos, 0);
+}
+
+const struct file_operations unix_sock_fops = {
+ .owner = THIS_MODULE,
+ .open = unix_open,
+ .release = unix_frelease,
+ .write = unix_write,
+ .read = unix_read,
+};
+#endif /* CONFIG_UNIX_FOPS */
static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
{
@@ -874,6 +982,11 @@ out_mknod_drop_write:
mnt_drop_write(nd.path.mnt);
if (err)
goto out_mknod_dput;
+
+#ifdef CONFIG_UNIX_FOPS
+ dentry->d_inode->i_fop = &unix_sock_fops;
+#endif
+
mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
dput(nd.path.dentry);
nd.path.dentry = dentry;
--
1.7.0.4
^ permalink raw reply related
* Re: [PATCH 1/1] net/unix: Allow Unix sockets to be treated like normal files
From: David Miller @ 2010-10-28 2:32 UTC (permalink / raw)
To: x; +Cc: netdev
In-Reply-To: <1288232669-8927-1-git-send-email-x@jeffhansen.com>
From: x@jeffhansen.com
Date: Thu, 28 Oct 2010 02:24:29 +0000
> Resent. Is there anything else I need to do for this patch to get reviewed
> and/or merged? Any comments by anyone?
I don't like this idea at all.
I remember there is a reason why similar things are not allowed for
sockets, it causes all sorts of problems although I forget the exact
details.
Take a look at net/socket.c:sock_no_open(), for example.
^ permalink raw reply
* Re: [PATCH] igb: Fix unused variable warning.
From: David Miller @ 2010-10-28 2:44 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: jesse, netdev
In-Reply-To: <AANLkTi=iDMYjMsvbeR9-ZhCE5zMHacO-vM5wTXvx4S9K@mail.gmail.com>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed, 27 Oct 2010 19:13:23 -0700
> On Tue, Oct 26, 2010 at 17:56, Jesse Gross <jesse@nicira.com> wrote:
>> Commit eab6d18d "vlan: Don't check for vlan group before
>> vlan_tx_tag_present" removed the need for the adapter variable
>> in igb_xmit_frame_ring_adv(). This removes the variable as well
>> to avoid the compiler warning.
>>
>> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
>> Signed-off-by: Jesse Gross <jesse@nicira.com>
>> ---
>> drivers/net/igb/igb_main.c | 1 -
>> 1 files changed, 0 insertions(+), 1 deletions(-)
>>
>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Applied.
^ permalink raw reply
* Re: [PATCH 1/1] net/unix: Allow Unix sockets to be treated like normal files
From: Jeff Hansen @ 2010-10-28 2:50 UTC (permalink / raw)
To: David Miller; +Cc: netdev
In-Reply-To: <20101027.193212.193709254.davem@davemloft.net>
Dave,
I agree that on larger systems this doesn't really make sense, but on
embedded platforms this can save some code space since applications can
get rid of their FIFO listeners and have strictly socket listeners.
That's why I made it an option that could be disabled by default.
Do you know who originally suggested that "creepy crawlies" are
introduced by allowing sockets to be opened? I'd be interested to know
how this could affect security, if at all.
-Jeff
On 10/27/2010 08:32 PM, David Miller wrote:
> From: x@jeffhansen.com
> Date: Thu, 28 Oct 2010 02:24:29 +0000
>
>> Resent. Is there anything else I need to do for this patch to get reviewed
>> and/or merged? Any comments by anyone?
> I don't like this idea at all.
>
> I remember there is a reason why similar things are not allowed for
> sockets, it causes all sorts of problems although I forget the exact
> details.
>
> Take a look at net/socket.c:sock_no_open(), for example.
>
--
---------------------------------------------------
"If someone's gotta do it, it might as well be me."
x@jeffhansen.com
^ permalink raw reply
* [PATCH 1/2] vmxnet3: remove unnecessary byteswapping in BAR writing macros
From: Harvey Harrison @ 2010-10-28 3:12 UTC (permalink / raw)
To: sbhatewara; +Cc: netdev
readl/writel swap to little-endian internally.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
---
drivers/net/vmxnet3/vmxnet3_int.h | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/vmxnet3/vmxnet3_int.h b/drivers/net/vmxnet3/vmxnet3_int.h
index 8a2f471..edf2288 100644
--- a/drivers/net/vmxnet3/vmxnet3_int.h
+++ b/drivers/net/vmxnet3/vmxnet3_int.h
@@ -330,14 +330,14 @@ struct vmxnet3_adapter {
};
#define VMXNET3_WRITE_BAR0_REG(adapter, reg, val) \
- writel(cpu_to_le32(val), (adapter)->hw_addr0 + (reg))
+ writel((val), (adapter)->hw_addr0 + (reg))
#define VMXNET3_READ_BAR0_REG(adapter, reg) \
- le32_to_cpu(readl((adapter)->hw_addr0 + (reg)))
+ readl((adapter)->hw_addr0 + (reg))
#define VMXNET3_WRITE_BAR1_REG(adapter, reg, val) \
- writel(cpu_to_le32(val), (adapter)->hw_addr1 + (reg))
+ writel((val), (adapter)->hw_addr1 + (reg))
#define VMXNET3_READ_BAR1_REG(adapter, reg) \
- le32_to_cpu(readl((adapter)->hw_addr1 + (reg)))
+ readl((adapter)->hw_addr1 + (reg))
#define VMXNET3_WAKE_QUEUE_THRESHOLD(tq) (5)
#define VMXNET3_RX_ALLOC_THRESHOLD(rq, ring_idx, adapter) \
--
1.7.1
^ permalink raw reply related
* [PATCH 2/2] vmxnet: trivial annotation of protocol constant
From: Harvey Harrison @ 2010-10-28 3:12 UTC (permalink / raw)
To: sbhatewara; +Cc: netdev
In-Reply-To: <1288235555-24675-1-git-send-email-harvey.harrison@gmail.com>
Noticed by sparse:
drivers/net/vmxnet3/vmxnet3_drv.c:876:38: warning: cast from restricted __be16
drivers/net/vmxnet3/vmxnet3_drv.c:876:38: warning: cast from restricted __be16
drivers/net/vmxnet3/vmxnet3_drv.c:876:24: warning: restricted __be16 degrades to integer
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
---
drivers/net/vmxnet3/vmxnet3_drv.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
index e3658e1..21314e0 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -873,7 +873,7 @@ vmxnet3_tq_xmit(struct sk_buff *skb, struct vmxnet3_tx_queue *tq,
count = VMXNET3_TXD_NEEDED(skb_headlen(skb)) +
skb_shinfo(skb)->nr_frags + 1;
- ctx.ipv4 = (skb->protocol == __constant_ntohs(ETH_P_IP));
+ ctx.ipv4 = (skb->protocol == cpu_to_be16(ETH_P_IP));
ctx.mss = skb_shinfo(skb)->gso_size;
if (ctx.mss) {
--
1.7.1
^ permalink raw reply related
* Re: cxgb3: kernel access of bad area with v2.6.36-6794-g12ba8d1
From: Eric Dumazet @ 2010-10-28 3:29 UTC (permalink / raw)
To: Nishanth Aravamudan; +Cc: Divy Le Ray, sonnyrao, netdev, linux-kernel
In-Reply-To: <20101028015407.GA9564@us.ibm.com>
Le mercredi 27 octobre 2010 à 18:54 -0700, Nishanth Aravamudan a écrit :
> Hi,
>
> I'm seeing the following trace w/ current git on a machine in our lab:
>
> Chelsio T3 Network Driver - version 1.1.4-ko
> cxgb3 0003:01:00.0: enabling device (0140 -> 0142)
> Unable to handle kernel paging request for data at address 0x00000010
> Faulting instruction address: 0xd000000008473ae8
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=32 NUMA pSeries
> last sysfs file: /sys/devices/virtual/block/dm-0/dev
> Modules linked in: cxgb3(+) mdio ehea ib_ehca ib_core ext4 jbd2 mbcache sd_mod crc_t10dif ipr dm_mod [last unloaded: scsi_wait_scan]
> NIP: d000000008473ae8 LR: d000000008473ac4 CTR: c0000000004398a0
> REGS: c0000007a157f190 TRAP: 0300 Not tainted (2.6.36)
> MSR: 8000000000009032 <EE,ME,IR,DR> CR: 24424444 XER: 00000000
> DAR: 0000000000000010, DSISR: 0000000040000000
> TASK = c0000007a3755290[741] 'modprobe' THREAD: c0000007a157c000 CPU: 24
> GPR00: 0000000000000000 c0000007a157f410 d000000008486978 c0000007a526c000
> GPR04: c0000000006d25dd c0000007a526c005 c0000007a526c29e 0000000000000002
> GPR08: 0000000000000004 0000000000000010 c0000007a526c0a0 0000000000000000
> GPR12: d000000008474aa8 c00000000eed3c00 d00000000847aeb8 0000000000000001
> GPR16: 0000000000001000 0000000000000000 d000000008477aa8 00003c047ef7e000
> GPR20: c0000007a8b7d280 c0000007a8b7d310 d00000000847d1c0 d00000000847d1d8
> GPR24: 0000000000000003 00003c047ef7efff 0000000000000001 c0000007a3c1c000
> GPR28: 0000000000000000 c0000007a526c000 d000000008484210 c0000007a3c1c000
> NIP [d000000008473ae8] .init_one+0x510/0xb7c [cxgb3]
> LR [d000000008473ac4] .init_one+0x4ec/0xb7c [cxgb3]
> Call Trace:
> [c0000007a157f410] [d000000008473ac4] .init_one+0x4ec/0xb7c [cxgb3] (unreliable)
> [c0000007a157f560] [c0000000002e40bc] .local_pci_probe+0x7c/0x100
> [c0000007a157f5f0] [c0000000002e5018] .pci_device_probe+0x148/0x150
> [c0000007a157f6a0] [c00000000034df68] .driver_probe_device+0x128/0x330
> [c0000007a157f750] [c00000000034e27c] .__driver_attach+0x10c/0x110
> [c0000007a157f7e0] [c00000000034d15c] .bus_for_each_dev+0x9c/0xf0
> [c0000007a157f890] [c00000000034dbc8] .driver_attach+0x28/0x40
> [c0000007a157f910] [c00000000034c648] .bus_add_driver+0x218/0x3d0
> [c0000007a157f9c0] [c00000000034e718] .driver_register+0x98/0x1d0
> [c0000007a157fa60] [c0000000002e5354] .__pci_register_driver+0x64/0x140
> [c0000007a157fb00] [d000000008474278] .cxgb3_init_module+0x2c/0x44 [cxgb3]
> [c0000007a157fb80] [c000000000009754] .do_one_initcall+0x64/0x1e0
> [c0000007a157fc40] [c0000000000d28b8] .SyS_init_module+0x1b8/0x1790
> [c0000007a157fe30] [c000000000008564] syscall_exit+0x0/0x40
> Instruction dump:
> 9b890018 9b090019 48000fe9 e8410028 801d0308 2f800000 419e003c 39600000
> e93d0300 796045e4 7d290214 39290010 <7c0048a8> 7c00d378 7c0049ad 40a2fff4
> ---[ end trace 2a530df8c4ad3d70 ]---
> udevd-work[600]: '/sbin/modprobe -b pci:v00001425d00000030sv00001014sd0000038Cbc02sc00i00' unexpected exit with status 0x000b
>
> I did an objdump -ldr of cxgb3.ko and:
>
>
> 4c0: 48 00 00 01 bl 4c0 <.init_one+0x4c0>
> 4c0: R_PPC64_REL24 .alloc_etherdev_mq
> 4c4: 60 00 00 00 nop
> 4c8: 7c 7d 1b 79 mr. r29,r3
> 4cc: 41 82 03 28 beq- 7f4 <.init_one+0x7f4>
> 4d0: 39 3d 07 00 addi r9,r29,1792
> 4d4: fa bd 03 f8 std r21,1016(r29)
> 4d8: fb bb 32 08 std r29,12808(r27)
> 4dc: fb fd 07 00 std r31,1792(r29)
> 4e0: 9b 89 00 18 stb r28,24(r9)
> 4e4: 9b 09 00 19 stb r24,25(r9)
> 4e8: 48 00 00 01 bl 4e8 <.init_one+0x4e8>
> 4e8: R_PPC64_REL24 .netif_carrier_off
> 4ec: 60 00 00 00 nop
> 4f0: 80 1d 03 08 lwz r0,776(r29)
> 4f4: 2f 80 00 00 cmpwi cr7,r0,0
> 4f8: 41 9e 00 3c beq- cr7,534 <.init_one+0x534>
> 4fc: 39 60 00 00 li r11,0
> 500: e9 3d 03 00 ld r9,768(r29)
> 504: 79 60 45 e4 rldicr r0,r11,8,55
> 508: 7d 29 02 14 add r9,r9,r0
> 50c: 39 29 00 10 addi r9,r9,16
> 510: 7c 00 48 a8 ldarx r0,0,r9
> 514: 7c 00 d3 78 or r0,r0,r26
> 518: 7c 00 49 ad stdcx. r0,0,r9
> 51c: 40 a2 ff f4 bne- 510 <.init_one+0x510>
>
> So I'm guessing it's somewhere in here:
>
> for (i = 0; i < ai->nports0 + ai->nports1; ++i) {
> struct net_device *netdev;
>
> netdev = alloc_etherdev_mq(sizeof(struct port_info), SGE_QSETS);
> if (!netdev) {
> err = -ENOMEM;
> goto out_free_dev;
> }
>
> SET_NETDEV_DEV(netdev, &pdev->dev);
>
> adapter->port[i] = netdev;
> pi = netdev_priv(netdev);
> pi->adapter = adapter;
> pi->rx_offload = T3_RX_CSUM | T3_LRO;
> pi->port_id = i;
> netif_carrier_off(netdev);
> netif_tx_stop_all_queues(netdev);
> netdev->irq = pdev->irq;
> netdev->mem_start = mmio_start;
> netdev->mem_end = mmio_start + mmio_len - 1;
> netdev->features |= NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO;
> netdev->features |= NETIF_F_GRO;
> if (pci_using_dac)
> netdev->features |= NETIF_F_HIGHDMA;
>
> netdev->features |= NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX;
> netdev->netdev_ops = &cxgb_netdev_ops;
> SET_ETHTOOL_OPS(netdev, &cxgb_ethtool_ops);
> }
>
> Well, presuming the trace is mostly accurate? I'm not sure what else is
> needed to determine the problem further. I'm building 2.6.36 as I write
> this. But it doesn't seem like this code has changed much and I had a
> working kernel around 2.6.36-rc7...
>
It seems this crash is because alloc_etherdev_mq() (alloc_netdev_mq())
not anymore allocates tx_queues.
So netif_tx_stop_all_queues(netdev) is crashing, accessing a NULL
pointer in your driver.
netif_tx_stop_all_queues(netdev) should be called only once device is
registered : netif_alloc_netdev_queues() is called from
register_netdevice()
Take a look at commit 8f6d9f40476895571 to have an example of how to fix
this problem.
Thanks
^ permalink raw reply
* [PATCH] via-velocity: Codestyle fixes
From: alex @ 2010-10-28 3:48 UTC (permalink / raw)
To: netdev; +Cc: linux-kernel, alex
From: Alexander Wigen <alex@wigen.net>
checkpatch.pl --terse --file drivers/net/via-velocity.c
Before: total: 13 errors, 106 warnings, 3537 lines checked
After: total: 0 errors, 96 warnings, 3536 lines checked
Signed-off-by: Alexander Wigen <alex@wigen.net>
---
drivers/net/via-velocity.c | 187 ++++++++++++++++++++++----------------------
1 files changed, 93 insertions(+), 94 deletions(-)
diff --git a/drivers/net/via-velocity.c b/drivers/net/via-velocity.c
index cab96ad..05c2e1e 100644
--- a/drivers/net/via-velocity.c
+++ b/drivers/net/via-velocity.c
@@ -84,7 +84,7 @@ static int velocity_nics;
static int msglevel = MSG_LEVEL_INFO;
/**
- * mac_get_cam_mask - Read a CAM mask
+ * mac_get_cam_mask - Read a CAM mask
* @regs: register block for this velocity
* @mask: buffer to store mask
*
@@ -113,7 +113,7 @@ static void mac_get_cam_mask(struct mac_regs __iomem *regs, u8 *mask)
/**
- * mac_set_cam_mask - Set a CAM mask
+ * mac_set_cam_mask - Set a CAM mask
* @regs: register block for this velocity
* @mask: CAM mask to load
*
@@ -156,7 +156,7 @@ static void mac_set_vlan_cam_mask(struct mac_regs __iomem *regs, u8 *mask)
}
/**
- * mac_set_cam - set CAM data
+ * mac_set_cam - set CAM data
* @regs: register block of this velocity
* @idx: Cam index
* @addr: 2 or 6 bytes of CAM data
@@ -211,11 +211,11 @@ static void mac_set_vlan_cam(struct mac_regs __iomem *regs, int idx,
/**
- * mac_wol_reset - reset WOL after exiting low power
+ * mac_wol_reset - reset WOL after exiting low power
* @regs: register block of this velocity
*
* Called after we drop out of wake on lan mode in order to
- * reset the Wake on lan features. This function doesn't restore
+ * reset the Wake On Lan features. This function doesn't restore
* the rest of the logic from the result of sleep/wakeup
*/
static void mac_wol_reset(struct mac_regs __iomem *regs)
@@ -342,7 +342,7 @@ VELOCITY_PARAM(ValPktLen, "Receiving or Drop invalid 802.3 frame");
1: Wake up if link status is on/off.
2: Wake up if recevied an arp packet.
4: Wake up if recevied any unicast packet.
- Those value can be sumed up to support more than one option.
+ Those value can be summed up to support more than one option.
*/
VELOCITY_PARAM(wol_opts, "Wake On Lan options");
@@ -370,7 +370,7 @@ static DEFINE_PCI_DEVICE_TABLE(velocity_id_table) = {
MODULE_DEVICE_TABLE(pci, velocity_id_table);
/**
- * get_chip_name - identifier to name
+ * get_chip_name - identifier to name
* @id: chip identifier
*
* Given a chip identifier return a suitable description. Returns
@@ -386,7 +386,7 @@ static const char __devinit *get_chip_name(enum chip_type chip_id)
}
/**
- * velocity_remove1 - device unplug
+ * velocity_remove1 - device unplug
* @pdev: PCI device being removed
*
* Device unload callback. Called on an unplug or on module
@@ -409,7 +409,7 @@ static void __devexit velocity_remove1(struct pci_dev *pdev)
}
/**
- * velocity_set_int_opt - parser for integer options
+ * velocity_set_int_opt - parser for integer options
* @opt: pointer to option value
* @val: value the user requested (or -1 for default)
* @min: lowest value allowed
@@ -438,7 +438,7 @@ static void __devinit velocity_set_int_opt(int *opt, int val, int min, int max,
}
/**
- * velocity_set_bool_opt - parser for boolean options
+ * velocity_set_bool_opt - parser for boolean options
* @opt: pointer to option value
* @val: value the user requested (or -1 for default)
* @def: default value (yes/no)
@@ -467,7 +467,7 @@ static void __devinit velocity_set_bool_opt(u32 *opt, int val, int def, u32 flag
}
/**
- * velocity_get_options - set options on device
+ * velocity_get_options - set options on device
* @opts: option structure for the device
* @index: index of option to use in module options array
* @devname: device name
@@ -492,7 +492,7 @@ static void __devinit velocity_get_options(struct velocity_opt *opts, int index,
}
/**
- * velocity_init_cam_filter - initialise CAM
+ * velocity_init_cam_filter - initialize CAM
* @vptr: velocity to program
*
* Initialize the content addressable memory used for filters. Load
@@ -507,8 +507,8 @@ static void velocity_init_cam_filter(struct velocity_info *vptr)
WORD_REG_BITS_ON(MCFG_VIDFR, ®s->MCFG);
/* Disable all CAMs */
- memset(vptr->vCAMmask, 0, sizeof(u8) * 8);
- memset(vptr->mCAMmask, 0, sizeof(u8) * 8);
+ memset(vptr->vCAMmask, 0, sizeof(u8) *8);
+ memset(vptr->mCAMmask, 0, sizeof(u8) *8);
mac_set_vlan_cam_mask(regs, vptr->vCAMmask);
mac_set_cam_mask(regs, vptr->mCAMmask);
@@ -564,7 +564,7 @@ static void velocity_init_rx_ring_indexes(struct velocity_info *vptr)
}
/**
- * velocity_rx_reset - handle a receive reset
+ * velocity_rx_reset - handle a receive reset
* @vptr: velocity we are resetting
*
* Reset the ownership and status for the receive ring side.
@@ -591,7 +591,7 @@ static void velocity_rx_reset(struct velocity_info *vptr)
}
/**
- * velocity_get_opt_media_mode - get media selection
+ * velocity_get_opt_media_mode - get media selection
* @vptr: velocity adapter
*
* Get the media mode stored in EEPROM or module options and load
@@ -627,7 +627,7 @@ static u32 velocity_get_opt_media_mode(struct velocity_info *vptr)
}
/**
- * safe_disable_mii_autopoll - autopoll off
+ * safe_disable_mii_autopoll - autopoll off
* @regs: velocity registers
*
* Turn off the autopoll and wait for it to disable on the chip
@@ -646,7 +646,7 @@ static void safe_disable_mii_autopoll(struct mac_regs __iomem *regs)
}
/**
- * enable_mii_autopoll - turn on autopolling
+ * enable_mii_autopoll - turn on autopolling
* @regs: velocity registers
*
* Enable the MII link status autopoll feature on the Velocity
@@ -676,7 +676,7 @@ static void enable_mii_autopoll(struct mac_regs __iomem *regs)
}
/**
- * velocity_mii_read - read MII data
+ * velocity_mii_read - read MII data
* @regs: velocity registers
* @index: MII register index
* @data: buffer for received data
@@ -712,7 +712,7 @@ static int velocity_mii_read(struct mac_regs __iomem *regs, u8 index, u16 *data)
/**
- * mii_check_media_mode - check media state
+ * mii_check_media_mode - check media state
* @regs: velocity registers
*
* Check the current MII status and determine the link status
@@ -755,7 +755,7 @@ static u32 mii_check_media_mode(struct mac_regs __iomem *regs)
}
/**
- * velocity_mii_write - write MII data
+ * velocity_mii_write - write MII data
* @regs: velocity registers
* @index: MII register index
* @data: 16bit data for the MII register
@@ -794,7 +794,7 @@ static int velocity_mii_write(struct mac_regs __iomem *regs, u8 mii_addr, u16 da
}
/**
- * set_mii_flow_control - flow control setup
+ * set_mii_flow_control - flow control setup
* @vptr: velocity interface
*
* Set up the flow control on this interface according to
@@ -829,10 +829,10 @@ static void set_mii_flow_control(struct velocity_info *vptr)
}
/**
- * mii_set_auto_on - autonegotiate on
+ * mii_set_auto_on - autonegotiate on
* @vptr: velocity
*
- * Enable autonegotation on this interface
+ * Enable autonegotiation on this interface
*/
static void mii_set_auto_on(struct velocity_info *vptr)
{
@@ -879,7 +879,7 @@ static u32 check_connection_type(struct mac_regs __iomem *regs)
/**
- * velocity_set_media_mode - set media mode
+ * velocity_set_media_mode - set media mode
* @mii_status: old MII link state
*
* Check the media link state and configure the flow control
@@ -999,7 +999,7 @@ static int velocity_set_media_mode(struct velocity_info *vptr, u32 mii_status)
}
/**
- * velocity_print_link_status - link status reporting
+ * velocity_print_link_status - link status reporting
* @vptr: velocity to report on
*
* Turn the link status of the velocity card into a kernel log
@@ -1050,7 +1050,7 @@ static void velocity_print_link_status(struct velocity_info *vptr)
}
/**
- * enable_flow_control_ability - flow control
+ * enable_flow_control_ability - flow control
* @vptr: veloity to configure
*
* Set up flow control according to the flow control options
@@ -1102,7 +1102,7 @@ static void enable_flow_control_ability(struct velocity_info *vptr)
}
/**
- * velocity_soft_reset - soft reset
+ * velocity_soft_reset - soft reset
* @vptr: velocity to reset
*
* Kick off a soft reset of the velocity adapter and then poll
@@ -1131,7 +1131,7 @@ static int velocity_soft_reset(struct velocity_info *vptr)
}
/**
- * velocity_set_multi - filter list change callback
+ * velocity_set_multi - filter list change callback
* @dev: network device
*
* Called by the network layer when the filter lists need to change
@@ -1181,7 +1181,7 @@ static void velocity_set_multi(struct net_device *dev)
*/
/**
- * mii_init - set up MII
+ * mii_init - set up MII
* @vptr: velocity adapter
* @mii_status: links tatus
*
@@ -1250,7 +1250,7 @@ static void mii_init(struct velocity_info *vptr, u32 mii_status)
}
/**
- * setup_queue_timers - Setup interrupt timers
+ * setup_queue_timers - Setup interrupt timers
*
* Setup interrupt frequency during suppression (timeout if the frame
* count isn't filled).
@@ -1273,7 +1273,7 @@ static void setup_queue_timers(struct velocity_info *vptr)
}
}
/**
- * setup_adaptive_interrupts - Setup interrupt suppression
+ * setup_adaptive_interrupts - Setup interrupt suppression
*
* @vptr velocity adapter
*
@@ -1311,11 +1311,11 @@ static void setup_adaptive_interrupts(struct velocity_info *vptr)
}
/**
- * velocity_init_registers - initialise MAC registers
+ * velocity_init_registers - initialise MAC registers
* @vptr: velocity to init
* @type: type of initialisation (hot or cold)
*
- * Initialise the MAC on a reset or on first set up on the
+ * Initialize the MAC on a reset or on first set up on the
* hardware.
*/
static void velocity_init_registers(struct velocity_info *vptr,
@@ -1459,7 +1459,7 @@ static void velocity_give_many_rx_descs(struct velocity_info *vptr)
}
/**
- * velocity_init_dma_rings - set up DMA rings
+ * velocity_init_dma_rings - set up DMA rings
* @vptr: Velocity to set up
*
* Allocate PCI mapped DMA rings for the receive and transmit layer
@@ -1511,7 +1511,7 @@ static void velocity_set_rxbufsize(struct velocity_info *vptr, int mtu)
}
/**
- * velocity_alloc_rx_buf - allocate aligned receive buffer
+ * velocity_alloc_rx_buf - allocate aligned receive buffer
* @vptr: velocity
* @idx: ring index
*
@@ -1542,7 +1542,7 @@ static int velocity_alloc_rx_buf(struct velocity_info *vptr, int idx)
* Fill in the descriptor to match
*/
- *((u32 *) & (rd->rdesc0)) = 0;
+ *((u32 *) &(rd->rdesc0)) = 0;
rd->size = cpu_to_le16(vptr->rx.buf_sz) | RX_INTEN;
rd->pa_low = cpu_to_le32(rd_info->skb_dma);
rd->pa_high = 0;
@@ -1578,7 +1578,7 @@ static int velocity_rx_refill(struct velocity_info *vptr)
}
/**
- * velocity_free_rd_ring - free receive ring
+ * velocity_free_rd_ring - free receive ring
* @vptr: velocity to clean up
*
* Free the receive buffers for each ring slot and any
@@ -1614,7 +1614,7 @@ static void velocity_free_rd_ring(struct velocity_info *vptr)
/**
- * velocity_init_rd_ring - set up receive ring
+ * velocity_init_rd_ring - set up receive ring
* @vptr: velocity to configure
*
* Allocate and set up the receive buffers for each ring slot and
@@ -1644,7 +1644,7 @@ out:
}
/**
- * velocity_init_td_ring - set up transmit ring
+ * velocity_init_td_ring - set up transmit ring
* @vptr: velocity
*
* Set up the transmit ring and chain the ring pointers together.
@@ -1673,7 +1673,7 @@ static int velocity_init_td_ring(struct velocity_info *vptr)
}
/**
- * velocity_free_dma_rings - free PCI ring pointers
+ * velocity_free_dma_rings - free PCI ring pointers
* @vptr: Velocity to free from
*
* Clean up the PCI ring buffers allocated to this velocity.
@@ -1681,7 +1681,7 @@ static int velocity_init_td_ring(struct velocity_info *vptr)
static void velocity_free_dma_rings(struct velocity_info *vptr)
{
const int size = vptr->options.numrx * sizeof(struct rx_desc) +
- vptr->options.numtx * sizeof(struct tx_desc) * vptr->tx.numq;
+ vptr->options.numtx * sizeof(struct tx_desc) *vptr->tx.numq;
pci_free_consistent(vptr->pdev, size, vptr->rx.ring, vptr->rx.pool_dma);
}
@@ -1715,7 +1715,7 @@ err_free_dma_rings_0:
}
/**
- * velocity_free_tx_buf - free transmit buffer
+ * velocity_free_tx_buf - free transmit buffer
* @vptr: velocity
* @tdinfo: buffer
*
@@ -1776,7 +1776,7 @@ static void velocity_free_td_ring_entry(struct velocity_info *vptr,
}
/**
- * velocity_free_td_ring - free td ring
+ * velocity_free_td_ring - free td ring
* @vptr: velocity
*
* Free up the transmit ring for this particular velocity adapter.
@@ -1806,7 +1806,7 @@ static void velocity_free_rings(struct velocity_info *vptr)
}
/**
- * velocity_error - handle error from controller
+ * velocity_error - handle error from controller
* @vptr: velocity
* @status: card status
*
@@ -1895,7 +1895,7 @@ static void velocity_error(struct velocity_info *vptr, int status)
}
/**
- * tx_srv - transmit interrupt service
+ * tx_srv - transmit interrupt service
* @vptr; Velocity
*
* Scan the queues looking for transmitted packets that
@@ -1963,7 +1963,7 @@ static int velocity_tx_srv(struct velocity_info *vptr)
}
/**
- * velocity_rx_csum - checksum process
+ * velocity_rx_csum - checksum process
* @rd: receive packet descriptor
* @skb: network layer packet buffer
*
@@ -1987,14 +1987,14 @@ static inline void velocity_rx_csum(struct rx_desc *rd, struct sk_buff *skb)
}
/**
- * velocity_rx_copy - in place Rx copy for small packets
+ * velocity_rx_copy - in place Rx copy for small packets
* @rx_skb: network layer packet buffer candidate
* @pkt_size: received data size
* @rd: receive packet descriptor
* @dev: network device
*
* Replace the current skb that is scheduled for Rx processing by a
- * shorter, immediatly allocated skb, if the received packet is small
+ * shorter, immediately allocated skb, if the received packet is small
* enough. This function returns a negative value if the received
* packet is too big or if memory is exhausted.
*/
@@ -2018,7 +2018,7 @@ static int velocity_rx_copy(struct sk_buff **rx_skb, int pkt_size,
}
/**
- * velocity_iph_realign - IP header alignment
+ * velocity_iph_realign - IP header alignment
* @vptr: velocity we are handling
* @skb: network layer packet buffer
* @pkt_size: received data size
@@ -2037,7 +2037,7 @@ static inline void velocity_iph_realign(struct velocity_info *vptr,
/**
- * velocity_receive_frame - received packet processor
+ * velocity_receive_frame - received packet processor
* @vptr: velocity we are handling
* @idx: ring index
*
@@ -2107,7 +2107,7 @@ static int velocity_receive_frame(struct velocity_info *vptr, int idx)
/**
- * velocity_rx_srv - service RX interrupt
+ * velocity_rx_srv - service RX interrupt
* @vptr: velocity
*
* Walk the receive ring of the velocity adapter and remove
@@ -2191,7 +2191,7 @@ static int velocity_poll(struct napi_struct *napi, int budget)
}
/**
- * velocity_intr - interrupt callback
+ * velocity_intr - interrupt callback
* @irq: interrupt number
* @dev_instance: interrupting device
*
@@ -2232,14 +2232,14 @@ static irqreturn_t velocity_intr(int irq, void *dev_instance)
}
/**
- * velocity_open - interface activation callback
+ * velocity_open - interface activation callback
* @dev: network layer device to open
*
* Called when the network layer brings the interface up. Returns
* a negative posix error code on failure, or zero on success.
*
* All the ring allocation and set up is done on open for this
- * adapter to minimise memory usage when inactive
+ * adapter to minimize memory usage when inactive
*/
static int velocity_open(struct net_device *dev)
{
@@ -2275,7 +2275,7 @@ out:
}
/**
- * velocity_shutdown - shut down the chip
+ * velocity_shutdown - shut down the chip
* @vptr: velocity to deactivate
*
* Shuts down the internal operations of the velocity and
@@ -2293,7 +2293,7 @@ static void velocity_shutdown(struct velocity_info *vptr)
}
/**
- * velocity_change_mtu - MTU change callback
+ * velocity_change_mtu - MTU change callback
* @dev: network device
* @new_mtu: desired MTU
*
@@ -2374,7 +2374,7 @@ out_0:
}
/**
- * velocity_mii_ioctl - MII ioctl handler
+ * velocity_mii_ioctl - MII ioctl handler
* @dev: network device
* @ifr: the ifreq block for the ioctl
* @cmd: the command
@@ -2415,7 +2415,7 @@ static int velocity_mii_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd
/**
- * velocity_ioctl - ioctl entry point
+ * velocity_ioctl - ioctl entry point
* @dev: network device
* @rq: interface request ioctl
* @cmd: command code
@@ -2452,7 +2452,7 @@ static int velocity_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
}
/**
- * velocity_get_status - statistics callback
+ * velocity_get_status - statistics callback
* @dev: network device
*
* Callback from the network layer to allow driver statistics
@@ -2477,24 +2477,24 @@ static struct net_device_stats *velocity_get_stats(struct net_device *dev)
dev->stats.rx_errors = vptr->mib_counter[HW_MIB_ifRxErrorPkts];
dev->stats.rx_length_errors = vptr->mib_counter[HW_MIB_ifInRangeLengthErrors];
-// unsigned long rx_dropped; /* no space in linux buffers */
+/* unsigned long rx_dropped; /* no space in linux buffers */
dev->stats.collisions = vptr->mib_counter[HW_MIB_ifTxEtherCollisions];
/* detailed rx_errors: */
-// unsigned long rx_length_errors;
-// unsigned long rx_over_errors; /* receiver ring buff overflow */
+/* unsigned long rx_length_errors;
+/* unsigned long rx_over_errors; /* receiver ring buff overflow */
dev->stats.rx_crc_errors = vptr->mib_counter[HW_MIB_ifRxPktCRCE];
-// unsigned long rx_frame_errors; /* recv'd frame alignment error */
-// unsigned long rx_fifo_errors; /* recv'r fifo overrun */
-// unsigned long rx_missed_errors; /* receiver missed packet */
+/* unsigned long rx_frame_errors; /* recv'd frame alignment error */
+/* unsigned long rx_fifo_errors; /* recv'r fifo overrun */
+/* unsigned long rx_missed_errors; /* receiver missed packet */
/* detailed tx_errors */
-// unsigned long tx_fifo_errors;
+/* unsigned long tx_fifo_errors;
return &dev->stats;
}
/**
- * velocity_close - close adapter callback
+ * velocity_close - close adapter callback
* @dev: network device
*
* Callback from the network layer when the velocity is being
@@ -2523,11 +2523,11 @@ static int velocity_close(struct net_device *dev)
}
/**
- * velocity_xmit - transmit packet callback
+ * velocity_xmit - transmit packet callback
* @skb: buffer to transmit
* @dev: network device
*
- * Called by the networ layer to request a packet is queued to
+ * Called by the network layer to request a packet is queued to
* the velocity. Returns zero on success.
*/
static netdev_tx_t velocity_xmit(struct sk_buff *skb,
@@ -2636,7 +2636,7 @@ static const struct net_device_ops velocity_netdev_ops = {
.ndo_start_xmit = velocity_xmit,
.ndo_get_stats = velocity_get_stats,
.ndo_validate_addr = eth_validate_addr,
- .ndo_set_mac_address = eth_mac_addr,
+ .ndo_set_mac_address = eth_mac_addr,
.ndo_set_multicast_list = velocity_set_multi,
.ndo_change_mtu = velocity_change_mtu,
.ndo_do_ioctl = velocity_ioctl,
@@ -2646,7 +2646,7 @@ static const struct net_device_ops velocity_netdev_ops = {
};
/**
- * velocity_init_info - init private data
+ * velocity_init_info - init private data
* @pdev: PCI device
* @vptr: Velocity info
* @info: Board type
@@ -2668,7 +2668,7 @@ static void __devinit velocity_init_info(struct pci_dev *pdev,
}
/**
- * velocity_get_pci_info - retrieve PCI info for device
+ * velocity_get_pci_info - retrieve PCI info for device
* @vptr: velocity device
* @pdev: PCI device it matches
*
@@ -2706,7 +2706,7 @@ static int __devinit velocity_get_pci_info(struct velocity_info *vptr, struct pc
}
/**
- * velocity_print_info - per driver data
+ * velocity_print_info - per driver data
* @vptr: velocity
*
* Print per driver data as the kernel driver finds Velocity
@@ -2730,7 +2730,7 @@ static u32 velocity_get_link(struct net_device *dev)
/**
- * velocity_found1 - set up discovered velocity card
+ * velocity_found1 - set up discovered velocity card
* @pdev: PCI device
* @ent: PCI device table entry that matched
*
@@ -2877,7 +2877,7 @@ err_free_dev:
#ifdef CONFIG_PM
/**
- * wol_calc_crc - WOL CRC
+ * wol_calc_crc - WOL CRC
* @pattern: data pattern
* @mask_pattern: mask
*
@@ -2912,7 +2912,7 @@ static u16 wol_calc_crc(int size, u8 *pattern, u8 *mask_pattern)
}
/**
- * velocity_set_wol - set up for wake on lan
+ * velocity_set_wol - set up for wake on lan
* @vptr: velocity to set WOL status on
*
* Set a card up for wake on lan either by unicast or by
@@ -2957,7 +2957,7 @@ static int velocity_set_wol(struct velocity_info *vptr)
memcpy(arp->ar_tip, vptr->ip_addr, 4);
crc = wol_calc_crc((sizeof(struct arp_packet) + 7) / 8, buf,
- (u8 *) & mask_pattern[0][0]);
+ (u8 *) &mask_pattern[0][0]);
writew(crc, ®s->PatternCRC[0]);
writew(WOLCR_ARP_EN, ®s->WOLCRSet);
@@ -2997,7 +2997,7 @@ static int velocity_set_wol(struct velocity_info *vptr)
}
/**
- * velocity_save_context - save registers
+ * velocity_save_context - save registers
* @vptr: velocity
* @context: buffer for stored context
*
@@ -3058,7 +3058,7 @@ static int velocity_suspend(struct pci_dev *pdev, pm_message_t state)
}
/**
- * velocity_restore_context - restore registers
+ * velocity_restore_context - restore registers
* @vptr: velocity
* @context: buffer for stored context
*
@@ -3133,19 +3133,19 @@ static int velocity_resume(struct pci_dev *pdev)
* uses this to handle all our card discover and plugging
*/
static struct pci_driver velocity_driver = {
- .name = VELOCITY_NAME,
- .id_table = velocity_id_table,
- .probe = velocity_found1,
- .remove = __devexit_p(velocity_remove1),
+ .name = VELOCITY_NAME,
+ .id_table = velocity_id_table,
+ .probe = velocity_found1,
+ .remove = __devexit_p(velocity_remove1),
#ifdef CONFIG_PM
- .suspend = velocity_suspend,
- .resume = velocity_resume,
+ .suspend = velocity_suspend,
+ .resume = velocity_resume,
#endif
};
/**
- * velocity_ethtool_up - pre hook for ethtool
+ * velocity_ethtool_up - pre hook for ethtool
* @dev: network device
*
* Called before an ethtool operation. We need to make sure the
@@ -3160,7 +3160,7 @@ static int velocity_ethtool_up(struct net_device *dev)
}
/**
- * velocity_ethtool_down - post hook for ethtool
+ * velocity_ethtool_down - post hook for ethtool
* @dev: network device
*
* Called after an ethtool operation. Restore the chip back to D3
@@ -3352,8 +3352,7 @@ static int get_pending_timer_val(int val)
int mult_bits = val >> 6;
int mult = 1;
- switch (mult_bits)
- {
+ switch (mult_bits) {
case 1:
mult = 4; break;
case 2:
@@ -3454,7 +3453,7 @@ static const struct ethtool_ops velocity_ethtool_ops = {
.set_wol = velocity_ethtool_set_wol,
.get_msglevel = velocity_get_msglevel,
.set_msglevel = velocity_set_msglevel,
- .set_sg = ethtool_op_set_sg,
+ .set_sg = ethtool_op_set_sg,
.get_link = velocity_get_link,
.get_coalesce = velocity_get_coalesce,
.set_coalesce = velocity_set_coalesce,
@@ -3480,7 +3479,7 @@ static int velocity_netdev_event(struct notifier_block *nb, unsigned long notifi
#if defined(CONFIG_PM) && defined(CONFIG_INET)
static struct notifier_block velocity_inetaddr_notifier = {
- .notifier_call = velocity_netdev_event,
+ .notifier_call = velocity_netdev_event,
};
static void velocity_register_notifier(void)
@@ -3501,7 +3500,7 @@ static void velocity_unregister_notifier(void)
#endif /* defined(CONFIG_PM) && defined(CONFIG_INET) */
/**
- * velocity_init_module - load time function
+ * velocity_init_module - load time function
*
* Called when the velocity module is loaded. The PCI driver
* is registered with the PCI layer, and in turn will call
@@ -3520,7 +3519,7 @@ static int __init velocity_init_module(void)
}
/**
- * velocity_cleanup - module unload
+ * velocity_cleanup - module unload
*
* When the velocity hardware is unloaded this function is called.
* It will clean up the notifiers and the unregister the PCI
--
1.7.3.2
^ permalink raw reply related
* Re: [PATCH] via-velocity: Codestyle fixes
From: Joe Perches @ 2010-10-28 4:15 UTC (permalink / raw)
To: alex; +Cc: netdev, linux-kernel
In-Reply-To: <1288237684-30783-1-git-send-email-alex@wigen.net>
On Thu, 2010-10-28 at 13:48 +1000, alex@wigen.net wrote:
> From: Alexander Wigen <alex@wigen.net>
> @@ -1542,7 +1542,7 @@ static int velocity_alloc_rx_buf(struct velocity_info *vptr, int idx)
> * Fill in the descriptor to match
> */
>
> - *((u32 *) & (rd->rdesc0)) = 0;
> + *((u32 *) &(rd->rdesc0)) = 0;
I think this style change isn't really that good.
rdesc0 is:
struct rdesc0 {
__le16 RSR; /* Receive status */
__le16 len; /* bits 0--13; bit 15 - owner */
};
I think it more sensible to either set the fields
in the struct to 0 or to memset the struct to 0.
> @@ -1681,7 +1681,7 @@ static int velocity_init_td_ring(struct velocity_info *vptr)
> static void velocity_free_dma_rings(struct velocity_info *vptr)
> {
> const int size = vptr->options.numrx * sizeof(struct rx_desc) +
> - vptr->options.numtx * sizeof(struct tx_desc) * vptr->tx.numq;
> + vptr->options.numtx * sizeof(struct tx_desc) *vptr->tx.numq;
If checkpatch warns about this, it's wrong.
> @@ -2477,24 +2477,24 @@ static struct net_device_stats *velocity_get_stats(struct net_device *dev)
> dev->stats.rx_errors = vptr->mib_counter[HW_MIB_ifRxErrorPkts];
> dev->stats.rx_length_errors = vptr->mib_counter[HW_MIB_ifInRangeLengthErrors];
>
> -// unsigned long rx_dropped; /* no space in linux buffers */
> +/* unsigned long rx_dropped; /* no space in linux buffers */
why not just remove all these commented-out lines?
> @@ -2957,7 +2957,7 @@ static int velocity_set_wol(struct velocity_info *vptr)
> memcpy(arp->ar_tip, vptr->ip_addr, 4);
>
> crc = wol_calc_crc((sizeof(struct arp_packet) + 7) / 8, buf,
> - (u8 *) & mask_pattern[0][0]);
> + (u8 *) &mask_pattern[0][0]);
perhaps just (u8 *)mask_pattern?
> @@ -3454,7 +3453,7 @@ static const struct ethtool_ops velocity_ethtool_ops = {
> .set_wol = velocity_ethtool_set_wol,
> .get_msglevel = velocity_get_msglevel,
> .set_msglevel = velocity_set_msglevel,
> - .set_sg = ethtool_op_set_sg,
> + .set_sg = ethtool_op_set_sg,
bad alignment?
^ permalink raw reply
* [PATCH] ipv6: addrconf: don't remove address state on ifdown if the address is being kept
From: Lorenzo Colitti @ 2010-10-28 4:16 UTC (permalink / raw)
To: netdev; +Cc: Maciej Żenczykowski, David Miller, Brian Haley
Currently, addrconf_ifdown does not delete statically configured IPv6
addresses when the interface is brought down. The intent is that when
the interface comes back up the address will be usable again. However,
this doesn't actually work, because the system stops listening on the
corresponding solicited-node multicast address, so the address cannot
respond to neighbor solicitations and thus receive traffic. Also, the
code notifies the rest of the system that the address is being deleted
(e.g, RTM_DELADDR), even though it is not. Fix it so that none of this
state is updated if the address is being kept on the interface.
--- a/net/ipv6/addrconf.c 2010-10-20 13:30:22.000000000 -0700
+++ b/net/ipv6/addrconf.c 2010-10-27 21:04:39.000000000 -0700
@@ -2737,10 +2737,7 @@ static int addrconf_ifdown(struct net_de
/* Flag it for later restoration when link comes up */
ifa->flags |= IFA_F_TENTATIVE;
ifa->state = INET6_IFADDR_STATE_DAD;
-
- write_unlock_bh(&idev->lock);
-
- in6_ifa_hold(ifa);
+ continue;
} else {
list_del(&ifa->if_list);
^ permalink raw reply
* RE: [PATCH net-next] ixgbe: fix stats handling
From: Eric Dumazet @ 2010-10-28 4:24 UTC (permalink / raw)
To: Tantilov, Emil S
Cc: David Miller, Waskiewicz Jr, Peter P, Kirsher, Jeffrey T,
Brattain, Ross B, netdev
In-Reply-To: <EA929A9653AAE14F841771FB1DE5A136602807B460@rrsmsx501.amr.corp.intel.com>
Le mercredi 27 octobre 2010 à 16:35 -0600, Tantilov, Emil S a écrit :
> >-----Original Message-----
> >From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> >Sent: Monday, October 11, 2010 5:17 AM
> >To: David Miller; Waskiewicz Jr, Peter P; Tantilov, Emil S; Kirsher,
> >Jeffrey T
> >Cc: netdev
> >Subject: [PATCH net-next] ixgbe: fix stats handling
> >
> >Hi
> >
> >I am sending this patch for Intel people review/test and acknowledge.
> >
> >Thanks !
> >
> >[PATCH net-next] ixgbe: fix stats handling
> >
> >Current ixgbe stats have following problems :
> >
> >- Not 64 bit safe (on 32bit arches)
> >
> >- Not safe in ixgbe_clean_rx_irq() :
> > All cpus dirty a common location (netdev->stats.rx_bytes &
> >netdev->stats.rx_packets) without proper synchronization.
> > This slow down a bit multiqueue operations, and possibly miss some
> >updates.
> >
> >Fixes :
> >
> >Implement ndo_get_stats64() method to provide accurate 64bit rx|tx
> >bytes/packets counters, using 64bit safe infrastructure.
> >
> >ixgbe_get_ethtool_stats() also use this infrastructure to provide 64bit
> >safe counters.
> >
> >Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> >CC: Peter Waskiewicz <peter.p.waskiewicz.jr@intel.com>
> >CC: Emil Tantilov <emil.s.tantilov@intel.com>
> >CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> >---
> > drivers/net/ixgbe/ixgbe.h | 3 +-
> > drivers/net/ixgbe/ixgbe_ethtool.c | 29 +++++++++++---------
> > drivers/net/ixgbe/ixgbe_main.c | 40 +++++++++++++++++++++++++---
> > 3 files changed, 56 insertions(+), 16 deletions(-)
> >
> >diff --git a/drivers/net/ixgbe/ixgbe.h b/drivers/net/ixgbe/ixgbe.h
> >index a8c47b0..944d9e2 100644
> >--- a/drivers/net/ixgbe/ixgbe.h
> >+++ b/drivers/net/ixgbe/ixgbe.h
> >@@ -180,8 +180,9 @@ struct ixgbe_ring {
> > */
> >
> > struct ixgbe_queue_stats stats;
> >- unsigned long reinit_state;
> >+ struct u64_stats_sync syncp;
> > int numa_node;
> >+ unsigned long reinit_state;
> > u64 rsc_count; /* stat for coalesced packets */
> > u64 rsc_flush; /* stats for flushed packets */
> > u32 restart_queue; /* track tx queue restarts */
> >diff --git a/drivers/net/ixgbe/ixgbe_ethtool.c
> >b/drivers/net/ixgbe/ixgbe_ethtool.c
> >index d4ac943..3c7f15d 100644
> >--- a/drivers/net/ixgbe/ixgbe_ethtool.c
> >+++ b/drivers/net/ixgbe/ixgbe_ethtool.c
> >@@ -999,12 +999,11 @@ static void ixgbe_get_ethtool_stats(struct net_device
> >*netdev,
> > struct ethtool_stats *stats, u64
> >*data)
> > {
> > struct ixgbe_adapter *adapter = netdev_priv(netdev);
> >- u64 *queue_stat;
> >- int stat_count = sizeof(struct ixgbe_queue_stats) / sizeof(u64);
> > struct rtnl_link_stats64 temp;
> > const struct rtnl_link_stats64 *net_stats;
> >- int j, k;
> >- int i;
> >+ unsigned int start;
> >+ struct ixgbe_ring *ring;
> >+ int i, j;
> > char *p = NULL;
> >
> > ixgbe_update_stats(adapter);
> >@@ -1025,16 +1024,22 @@ static void ixgbe_get_ethtool_stats(struct
> >net_device *netdev,
> > sizeof(u64)) ? *(u64 *)p : *(u32 *)p;
> > }
> > for (j = 0; j < adapter->num_tx_queues; j++) {
> >- queue_stat = (u64 *)&adapter->tx_ring[j]->stats;
> >- for (k = 0; k < stat_count; k++)
> >- data[i + k] = queue_stat[k];
> >- i += k;
> >+ ring = adapter->tx_ring[j];
> >+ do {
> >+ start = u64_stats_fetch_begin_bh(&ring->syncp);
> >+ data[i] = ring->stats.packets;
> >+ data[i+1] = ring->stats.bytes;
> >+ } while (u64_stats_fetch_retry_bh(&ring->syncp, start));
> >+ i += 2;
> > }
> > for (j = 0; j < adapter->num_rx_queues; j++) {
> >- queue_stat = (u64 *)&adapter->rx_ring[j]->stats;
> >- for (k = 0; k < stat_count; k++)
> >- data[i + k] = queue_stat[k];
> >- i += k;
> >+ ring = adapter->rx_ring[j];
> >+ do {
> >+ start = u64_stats_fetch_begin_bh(&ring->syncp);
> >+ data[i] = ring->stats.packets;
> >+ data[i+1] = ring->stats.bytes;
> >+ } while (u64_stats_fetch_retry_bh(&ring->syncp, start));
> >+ i += 2;
> > }
> > if (adapter->flags & IXGBE_FLAG_DCB_ENABLED) {
> > for (j = 0; j < MAX_TX_PACKET_BUFFERS; j++) {
> >diff --git a/drivers/net/ixgbe/ixgbe_main.c
> >b/drivers/net/ixgbe/ixgbe_main.c
> >index 95dbf60..1efbcde 100644
> >--- a/drivers/net/ixgbe/ixgbe_main.c
> >+++ b/drivers/net/ixgbe/ixgbe_main.c
> >@@ -824,8 +824,10 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector
> >*q_vector,
> >
> > tx_ring->total_bytes += total_bytes;
> > tx_ring->total_packets += total_packets;
> >+ u64_stats_update_begin(&tx_ring->syncp);
> > tx_ring->stats.packets += total_packets;
> > tx_ring->stats.bytes += total_bytes;
> >+ u64_stats_update_end(&tx_ring->syncp);
> > return count < tx_ring->work_limit;
> > }
> >
> >@@ -1172,7 +1174,6 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector
> >*q_vector,
> > int *work_done, int work_to_do)
> > {
> > struct ixgbe_adapter *adapter = q_vector->adapter;
> >- struct net_device *netdev = adapter->netdev;
> > struct pci_dev *pdev = adapter->pdev;
> > union ixgbe_adv_rx_desc *rx_desc, *next_rxd;
> > struct ixgbe_rx_buffer *rx_buffer_info, *next_buffer;
> >@@ -1298,8 +1299,10 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector
> >*q_vector,
> > rx_ring->rsc_count++;
> > rx_ring->rsc_flush++;
> > }
> >+ u64_stats_update_begin(&rx_ring->syncp);
> > rx_ring->stats.packets++;
> > rx_ring->stats.bytes += skb->len;
> >+ u64_stats_update_end(&rx_ring->syncp);
> > } else {
> > if (rx_ring->flags & IXGBE_RING_RX_PS_ENABLED) {
> > rx_buffer_info->skb = next_buffer->skb;
> >@@ -1375,8 +1378,6 @@ next_desc:
> >
> > rx_ring->total_packets += total_rx_packets;
> > rx_ring->total_bytes += total_rx_bytes;
> >- netdev->stats.rx_bytes += total_rx_bytes;
> >- netdev->stats.rx_packets += total_rx_packets;
> >
> > return cleaned;
> > }
> >@@ -6559,6 +6560,38 @@ static void ixgbe_netpoll(struct net_device *netdev)
> > }
> > #endif
> >
> >+static struct rtnl_link_stats64 *ixgbe_get_stats64(struct net_device
> >*netdev,
> >+ struct rtnl_link_stats64 *stats)
> >+{
> >+ struct ixgbe_adapter *adapter = netdev_priv(netdev);
> >+ int i;
> >+
> >+ /* accurate rx/tx bytes/packets stats */
> >+ dev_txq_stats_fold(netdev, stats);
> >+ for (i = 0; i < adapter->num_rx_queues; i++) {
> >+ struct ixgbe_ring *ring = adapter->rx_ring[i];
> >+ u64 bytes, packets;
> >+ unsigned int start;
> >+
> >+ do {
> >+ start = u64_stats_fetch_begin_bh(&ring->syncp);
> >+ packets = ring->stats.packets;
> >+ bytes = ring->stats.bytes;
> >+ } while (u64_stats_fetch_retry_bh(&ring->syncp, start));
> >+ stats->rx_packets += packets;
> >+ stats->rx_bytes += bytes;
> >+ }
> >+
> >+ /* following stats updated by ixgbe_watchdog_task() */
> >+ stats->multicast = netdev->stats.multicast;
> >+ stats->rx_errors = netdev->stats.rx_errors;
> >+ stats->rx_length_errors = netdev->stats.rx_length_errors;
> >+ stats->rx_crc_errors = netdev->stats.rx_crc_errors;
> >+ stats->rx_missed_errors = netdev->stats.rx_missed_errors;
> >+ return stats;
> >+}
> >+
> >+
> > static const struct net_device_ops ixgbe_netdev_ops = {
> > .ndo_open = ixgbe_open,
> > .ndo_stop = ixgbe_close,
> >@@ -6578,6 +6611,7 @@ static const struct net_device_ops ixgbe_netdev_ops =
> >{
> > .ndo_set_vf_vlan = ixgbe_ndo_set_vf_vlan,
> > .ndo_set_vf_tx_rate = ixgbe_ndo_set_vf_bw,
> > .ndo_get_vf_config = ixgbe_ndo_get_vf_config,
> >+ .ndo_get_stats64 = ixgbe_get_stats64,
> > #ifdef CONFIG_NET_POLL_CONTROLLER
> > .ndo_poll_controller = ixgbe_netpoll,
> > #endif
> >
>
> Eric,
>
> We are seeing intermittent hangs on ia32 arch which seem to point to this patch:
>
> BUG: unable to handle kernel NULL pointer dereference at 00000040
> IP: [<f7f6b537>] ixgbe_get_stats64+0x47/0x120 [ixgbe]
> *pdpt = 000000002dc83001 *pde = 000000032d7e5067
> Oops: 0000 [#2] SMP
> last sysfs file: /sys/devices/system/cpu/cpu23/cache/index2/shared_cpu_map
> Modules linked in: act_skbedit cls_u32 sch_multiq ixgbe mdio netconsole configfs autofs4 8021q garp stp llc sunrpc ipv6 e
> xt3 jbd dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath power_meter hwmon sg ses enclosure dcdbas pcspkr serio_raw iTCO_wdt iTCO_vendor_support io
> atdma dca i7core_edac edac_core bnx2 ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix megaraid_sas dm_mod [last unloaded: mperf
> ]
>
> Pid: 1939, comm: irqbalance Tainted: G D W 2.6.36-rc7-upstream-net-next-2.6-ixgbe-queue-i386-g55e1a84 #1 09CGW2/Po
> werEdge T610
> EIP: 0060:[<f7f6b537>] EFLAGS: 00010206 CPU: 0
> EIP is at ixgbe_get_stats64+0x47/0x120 [ixgbe]
> EAX: 00000000 EBX: ecc45e4c ECX: ebea0400 EDX: 00000000
> ESI: ebea0000 EDI: 00000018 EBP: f7f846a0 ESP: ecc45d88
> DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> Process irqbalance (pid: 1939, ti=ecc44000 task=efc63a50 task.ti=ecc44000)
> Stack:
> ecc45e4c 00000000 ebea0400 ebea0000 ecc45e4c ebea0000 ecc45f04 f7f846a0
> <0> c0750593 ebea0000 edfec340 ebea0000 000002b5 c075063a edfec340 c0993eec
> <0> ebe78000 0000021c 00000000 00000009 00000000 00000000 00000000 00000000
> Call Trace:
> [<c0750593>] ? dev_get_stats+0x33/0xc0
> [<c075063a>] ? dev_seq_printf_stats+0x1a/0x180
> [<c07507aa>] ? dev_seq_show+0xa/0x20
> [<c052398f>] ? seq_read+0x22f/0x3d0
> [<c0523760>] ? seq_read+0x0/0x3d0
> [<c054fdde>] ? proc_reg_read+0x5e/0x90
> [<c054fd80>] ? proc_reg_read+0x0/0x90
> [<c050a1dd>] ? vfs_read+0x9d/0x160
> [<c049d4ef>] ? audit_syscall_entry+0x20f/0x230
> [<c050a971>] ? sys_read+0x41/0x70
> [<c0409cdf>] ? sysenter_do_call+0x12/0x28
> Code: 60 4f 7e c8 8b 44 24 08 8b b8 20 06 00 00 85 ff 7e 63 c7 44 24 04 00 00 00 00 66 90 8b 54 24 04 8b 4c 24 08 8b 84 9
> 1 00 05 00 00 <8b> 50 40 eb 06 8d 74 26 00 89 ca f6 c2 01 0f 85 ae 00 00 00 8b
> EIP: [<f7f6b537>] ixgbe_get_stats64+0x47/0x120 [ixgbe] SS:ESP 0068:ecc45d88
> CR2: 0000000000000040
> ---[ end trace 51ea89f4e57f54f1 ]---
>
> Emil
Code disassembly makes no sense to me : this doesnt match C code at all
60 4f 7e c8
8b 44 24 08 mov 0x08(%esp),%eax
8b b8 20 06 00 00 mov 0x620(%eax),%edi
85 ff test %edi,%edi
7e 63 jle +0x63
c7 44 24 04 00 00 00 00 movl $0,0x4(%esp)
66 90 xchg %ax,%ax
8b 54 24 04 mov 0x04(%esp),%edx (0 in your report)
8b 4c 24 08 mov 0x08(%esp),%ecx (ebea0400 in your rep)
8b 84 91 00 05 00 00 mov 0x500(,),eax
<8b> 50 40 mov 0x40(%eax),%edx NULL pointer deref
eb 06 jmp +6
8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
89 ca mov %ecx,%edx
f6 c2 01 test $0x1,%dl
0f 85 ae 00 00 00 jne
8b
Could you disassemble your ixgbe_get_stats64() function ?
Thanks
Mine is :
00000e29 <ixgbe_get_stats64>:
e29: 55 push %ebp
e2a: 89 e5 mov %esp,%ebp
e2c: 57 push %edi
e2d: 56 push %esi
e2e: 89 c6 mov %eax,%esi
e30: 53 push %ebx
e31: 89 d3 mov %edx,%ebx
e33: 83 ec 18 sub $0x18,%esp
e36: 8d b8 40 03 00 00 lea 0x340(%eax),%edi
e3c: e8 fc ff ff ff call dev_txq_stats_fold
e41: 31 c0 xor %eax,%eax
e43: eb 54 jmp e99 <ixgbe_get_stats64+0x70>
e45: 8b 94 87 40 0d 00 00 mov 0xd40(%edi,%eax,4),%edx
e4c: 89 75 e0 mov %esi,-0x20(%ebp)
e4f: 89 5d dc mov %ebx,-0x24(%ebp)
e52: 8b 4a 34 mov 0x34(%edx),%ecx
e55: f6 c1 01 test $0x1,%cl
e58: 74 04 je e5e <ixgbe_get_stats64+0x35>
e5a: f3 90 pause
e5c: eb f4 jmp e52 <ixgbe_get_stats64+0x29>
e5e: 8b 5a 24 mov 0x24(%edx),%ebx
e61: 8b 72 28 mov 0x28(%edx),%esi
e64: 89 5d ec mov %ebx,-0x14(%ebp)
e67: 8b 5a 2c mov 0x2c(%edx),%ebx
e6a: 89 75 f0 mov %esi,-0x10(%ebp)
e6d: 8b 72 30 mov 0x30(%edx),%esi
e70: 89 5d e4 mov %ebx,-0x1c(%ebp)
e73: 89 75 e8 mov %esi,-0x18(%ebp)
e76: 39 4a 34 cmp %ecx,0x34(%edx)
e79: 75 d7 jne e52 <ixgbe_get_stats64+0x29>
e7b: 8b 5d dc mov -0x24(%ebp),%ebx
e7e: 8b 55 ec mov -0x14(%ebp),%edx
e81: 8b 75 e0 mov -0x20(%ebp),%esi
e84: 8b 4d f0 mov -0x10(%ebp),%ecx
e87: 01 13 add %edx,(%ebx)
e89: 8b 55 e4 mov -0x1c(%ebp),%edx
e8c: 11 4b 04 adc %ecx,0x4(%ebx)
e8f: 01 53 10 add %edx,0x10(%ebx)
e92: 8b 4d e8 mov -0x18(%ebp),%ecx
e95: 11 4b 14 adc %ecx,0x14(%ebx)
e98: 40 inc %eax
e99: 3b 87 40 0e 00 00 cmp 0xe40(%edi),%eax
e9f: 7c a4 jl e45 <ixgbe_get_stats64+0x1c>
ea1: 8b 8e 90 00 00 00 mov 0x90(%esi),%ecx
ea7: c7 43 44 00 00 00 00 movl $0x0,0x44(%ebx)
eae: 89 4b 40 mov %ecx,0x40(%ebx)
eb1: 8b 86 80 00 00 00 mov 0x80(%esi),%eax
eb7: c7 43 24 00 00 00 00 movl $0x0,0x24(%ebx)
ebe: 89 43 20 mov %eax,0x20(%ebx)
ec1: 8b 96 98 00 00 00 mov 0x98(%esi),%edx
ec7: c7 43 54 00 00 00 00 movl $0x0,0x54(%ebx)
ece: 89 53 50 mov %edx,0x50(%ebx)
ed1: 8b 8e a0 00 00 00 mov 0xa0(%esi),%ecx
ed7: c7 43 64 00 00 00 00 movl $0x0,0x64(%ebx)
ede: 89 4b 60 mov %ecx,0x60(%ebx)
ee1: 8b 86 ac 00 00 00 mov 0xac(%esi),%eax
ee7: c7 43 7c 00 00 00 00 movl $0x0,0x7c(%ebx)
eee: 89 43 78 mov %eax,0x78(%ebx)
ef1: 83 c4 18 add $0x18,%esp
ef4: 89 d8 mov %ebx,%eax
ef6: 5b pop %ebx
ef7: 5e pop %esi
ef8: 5f pop %edi
ef9: 5d pop %ebp
efa: c3 ret
No 0x40(ptr) deref for me
Could adapter->rx_ring[i] becomes NULL sometime in your driver ?
If yes, a lock/protection is needed (same for ethtool stats I guess)
^ permalink raw reply
* Re: [RFC PATCH 1/1] vhost: TX used buffer guest signal accumulation
From: Shirley Ma @ 2010-10-28 4:40 UTC (permalink / raw)
To: mst@redhat.com; +Cc: David Miller, netdev, kvm, linux-kernel
In-Reply-To: <1288216693.17571.38.camel@localhost.localdomain>
Resubmit this patch for fixing some minor error (white space, typo).
Signed-off-by: Shirley Ma <xma@us.ibm.com>
---
drivers/vhost/net.c | 20 +++++++++++++++++++-
drivers/vhost/vhost.c | 32 ++++++++++++++++++++++++++++++++
drivers/vhost/vhost.h | 3 +++
3 files changed, 54 insertions(+), 1 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 4b4da5b..3eb8016 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -128,6 +128,7 @@ static void handle_tx(struct vhost_net *net)
int err, wmem;
size_t hdr_size;
struct socket *sock;
+ int max_pend = vq->num - (vq->num >> 2);
sock = rcu_dereference_check(vq->private_data,
lockdep_is_held(&vq->mutex));
@@ -198,7 +199,24 @@ static void handle_tx(struct vhost_net *net)
if (err != len)
pr_debug("Truncated TX packet: "
" len %d != %zd\n", err, len);
- vhost_add_used_and_signal(&net->dev, vq, head, 0);
+ /*
+ * if no pending buffer size allocate, signal used buffer
+ * one by one, otherwise, signal used buffer when reaching
+ * 3/4 ring size to reduce CPU utilization.
+ */
+ if (unlikely(vq->pend))
+ vhost_add_used_and_signal(&net->dev, vq, head, 0);
+ else {
+ vq->pend[vq->num_pend].id = head;
+ vq->pend[vq->num_pend].len = 0;
+ ++vq->num_pend;
+ if (vq->num_pend == max_pend) {
+ vhost_add_used_and_signal_n(&net->dev, vq,
+ vq->pend,
+ vq->num_pend);
+ vq->num_pend = 0;
+ }
+ }
total_len += len;
if (unlikely(total_len >= VHOST_NET_WEIGHT)) {
vhost_poll_queue(&vq->poll);
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 94701ff..f2f3288 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -170,6 +170,16 @@ static void vhost_vq_reset(struct vhost_dev *dev,
vq->call_ctx = NULL;
vq->call = NULL;
vq->log_ctx = NULL;
+ /* signal pending used buffers */
+ if (vq->pend) {
+ if (vq->num_pend != 0) {
+ vhost_add_used_and_signal_n(dev, vq, vq->pend,
+ vq->num_pend);
+ vq->num_pend = 0;
+ }
+ kfree(vq->pend);
+ }
+ vq->pend = NULL;
}
static int vhost_worker(void *data)
@@ -273,7 +283,14 @@ long vhost_dev_init(struct vhost_dev *dev,
dev->vqs[i].heads = NULL;
dev->vqs[i].dev = dev;
mutex_init(&dev->vqs[i].mutex);
+ dev->vqs[i].num_pend = 0;
+ dev->vqs[i].pend = NULL;
vhost_vq_reset(dev, dev->vqs + i);
+ /* signal 3/4 of ring size used buffers */
+ dev->vqs[i].pend = kmalloc((dev->vqs[i].num -
+ (dev->vqs[i].num >> 2)) *
+ sizeof *dev->vqs[i].pend,
+ GFP_KERNEL);
if (dev->vqs[i].handle_kick)
vhost_poll_init(&dev->vqs[i].poll,
dev->vqs[i].handle_kick, POLLIN, dev);
@@ -599,6 +616,21 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
r = -EINVAL;
break;
}
+ if (vq->num != s.num) {
+ /* signal used buffers first */
+ if (vq->pend) {
+ if (vq->num_pend != 0) {
+ vhost_add_used_and_signal_n(vq->dev, vq,
+ vq->pend,
+ vq->num_pend);
+ vq->num_pend = 0;
+ }
+ kfree(vq->pend);
+ }
+ /* realloc pending used buffers size */
+ vq->pend = kmalloc((s.num - (s.num >> 2)) *
+ sizeof *vq->pend, GFP_KERNEL);
+ }
vq->num = s.num;
break;
case VHOST_SET_VRING_BASE:
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 073d06a..78949c0 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -108,6 +108,9 @@ struct vhost_virtqueue {
/* Log write descriptors */
void __user *log_base;
struct vhost_log *log;
+ /* delay multiple used buffers to signal once */
+ int num_pend;
+ struct vring_used_elem *pend;
};
struct vhost_dev {
^ permalink raw reply related
* [PATCH] via-velocity: Codestyle fixes
From: alex @ 2010-10-28 5:00 UTC (permalink / raw)
To: netdev; +Cc: linux-kernel, alex
From: Alexander Wigen <alex@wigen.net>
checkpatch.pl --terse --file drivers/net/via-velocity.c
Before: total: 13 errors, 106 warnings, 3537 lines checked
After: total: 2 errors, 96 warnings, 3526 lines checked
Signed-off-by: Alexander Wigen <alex@wigen.net>
---
drivers/net/via-velocity.c | 181 +++++++++++++++++++++-----------------------
1 files changed, 85 insertions(+), 96 deletions(-)
diff --git a/drivers/net/via-velocity.c b/drivers/net/via-velocity.c
index cab96ad..fe80537 100644
--- a/drivers/net/via-velocity.c
+++ b/drivers/net/via-velocity.c
@@ -84,7 +84,7 @@ static int velocity_nics;
static int msglevel = MSG_LEVEL_INFO;
/**
- * mac_get_cam_mask - Read a CAM mask
+ * mac_get_cam_mask - Read a CAM mask
* @regs: register block for this velocity
* @mask: buffer to store mask
*
@@ -113,7 +113,7 @@ static void mac_get_cam_mask(struct mac_regs __iomem *regs, u8 *mask)
/**
- * mac_set_cam_mask - Set a CAM mask
+ * mac_set_cam_mask - Set a CAM mask
* @regs: register block for this velocity
* @mask: CAM mask to load
*
@@ -156,7 +156,7 @@ static void mac_set_vlan_cam_mask(struct mac_regs __iomem *regs, u8 *mask)
}
/**
- * mac_set_cam - set CAM data
+ * mac_set_cam - set CAM data
* @regs: register block of this velocity
* @idx: Cam index
* @addr: 2 or 6 bytes of CAM data
@@ -211,11 +211,11 @@ static void mac_set_vlan_cam(struct mac_regs __iomem *regs, int idx,
/**
- * mac_wol_reset - reset WOL after exiting low power
+ * mac_wol_reset - reset WOL after exiting low power
* @regs: register block of this velocity
*
* Called after we drop out of wake on lan mode in order to
- * reset the Wake on lan features. This function doesn't restore
+ * reset the Wake On Lan features. This function doesn't restore
* the rest of the logic from the result of sleep/wakeup
*/
static void mac_wol_reset(struct mac_regs __iomem *regs)
@@ -342,7 +342,7 @@ VELOCITY_PARAM(ValPktLen, "Receiving or Drop invalid 802.3 frame");
1: Wake up if link status is on/off.
2: Wake up if recevied an arp packet.
4: Wake up if recevied any unicast packet.
- Those value can be sumed up to support more than one option.
+ Those value can be summed up to support more than one option.
*/
VELOCITY_PARAM(wol_opts, "Wake On Lan options");
@@ -370,7 +370,7 @@ static DEFINE_PCI_DEVICE_TABLE(velocity_id_table) = {
MODULE_DEVICE_TABLE(pci, velocity_id_table);
/**
- * get_chip_name - identifier to name
+ * get_chip_name - identifier to name
* @id: chip identifier
*
* Given a chip identifier return a suitable description. Returns
@@ -386,7 +386,7 @@ static const char __devinit *get_chip_name(enum chip_type chip_id)
}
/**
- * velocity_remove1 - device unplug
+ * velocity_remove1 - device unplug
* @pdev: PCI device being removed
*
* Device unload callback. Called on an unplug or on module
@@ -409,7 +409,7 @@ static void __devexit velocity_remove1(struct pci_dev *pdev)
}
/**
- * velocity_set_int_opt - parser for integer options
+ * velocity_set_int_opt - parser for integer options
* @opt: pointer to option value
* @val: value the user requested (or -1 for default)
* @min: lowest value allowed
@@ -438,7 +438,7 @@ static void __devinit velocity_set_int_opt(int *opt, int val, int min, int max,
}
/**
- * velocity_set_bool_opt - parser for boolean options
+ * velocity_set_bool_opt - parser for boolean options
* @opt: pointer to option value
* @val: value the user requested (or -1 for default)
* @def: default value (yes/no)
@@ -467,7 +467,7 @@ static void __devinit velocity_set_bool_opt(u32 *opt, int val, int def, u32 flag
}
/**
- * velocity_get_options - set options on device
+ * velocity_get_options - set options on device
* @opts: option structure for the device
* @index: index of option to use in module options array
* @devname: device name
@@ -492,7 +492,7 @@ static void __devinit velocity_get_options(struct velocity_opt *opts, int index,
}
/**
- * velocity_init_cam_filter - initialise CAM
+ * velocity_init_cam_filter - initialize CAM
* @vptr: velocity to program
*
* Initialize the content addressable memory used for filters. Load
@@ -507,8 +507,8 @@ static void velocity_init_cam_filter(struct velocity_info *vptr)
WORD_REG_BITS_ON(MCFG_VIDFR, ®s->MCFG);
/* Disable all CAMs */
- memset(vptr->vCAMmask, 0, sizeof(u8) * 8);
- memset(vptr->mCAMmask, 0, sizeof(u8) * 8);
+ memset(vptr->vCAMmask, 0, sizeof(u8) *8);
+ memset(vptr->mCAMmask, 0, sizeof(u8) *8);
mac_set_vlan_cam_mask(regs, vptr->vCAMmask);
mac_set_cam_mask(regs, vptr->mCAMmask);
@@ -564,7 +564,7 @@ static void velocity_init_rx_ring_indexes(struct velocity_info *vptr)
}
/**
- * velocity_rx_reset - handle a receive reset
+ * velocity_rx_reset - handle a receive reset
* @vptr: velocity we are resetting
*
* Reset the ownership and status for the receive ring side.
@@ -591,7 +591,7 @@ static void velocity_rx_reset(struct velocity_info *vptr)
}
/**
- * velocity_get_opt_media_mode - get media selection
+ * velocity_get_opt_media_mode - get media selection
* @vptr: velocity adapter
*
* Get the media mode stored in EEPROM or module options and load
@@ -627,7 +627,7 @@ static u32 velocity_get_opt_media_mode(struct velocity_info *vptr)
}
/**
- * safe_disable_mii_autopoll - autopoll off
+ * safe_disable_mii_autopoll - autopoll off
* @regs: velocity registers
*
* Turn off the autopoll and wait for it to disable on the chip
@@ -646,7 +646,7 @@ static void safe_disable_mii_autopoll(struct mac_regs __iomem *regs)
}
/**
- * enable_mii_autopoll - turn on autopolling
+ * enable_mii_autopoll - turn on autopolling
* @regs: velocity registers
*
* Enable the MII link status autopoll feature on the Velocity
@@ -676,7 +676,7 @@ static void enable_mii_autopoll(struct mac_regs __iomem *regs)
}
/**
- * velocity_mii_read - read MII data
+ * velocity_mii_read - read MII data
* @regs: velocity registers
* @index: MII register index
* @data: buffer for received data
@@ -712,7 +712,7 @@ static int velocity_mii_read(struct mac_regs __iomem *regs, u8 index, u16 *data)
/**
- * mii_check_media_mode - check media state
+ * mii_check_media_mode - check media state
* @regs: velocity registers
*
* Check the current MII status and determine the link status
@@ -755,7 +755,7 @@ static u32 mii_check_media_mode(struct mac_regs __iomem *regs)
}
/**
- * velocity_mii_write - write MII data
+ * velocity_mii_write - write MII data
* @regs: velocity registers
* @index: MII register index
* @data: 16bit data for the MII register
@@ -794,7 +794,7 @@ static int velocity_mii_write(struct mac_regs __iomem *regs, u8 mii_addr, u16 da
}
/**
- * set_mii_flow_control - flow control setup
+ * set_mii_flow_control - flow control setup
* @vptr: velocity interface
*
* Set up the flow control on this interface according to
@@ -829,10 +829,10 @@ static void set_mii_flow_control(struct velocity_info *vptr)
}
/**
- * mii_set_auto_on - autonegotiate on
+ * mii_set_auto_on - autonegotiate on
* @vptr: velocity
*
- * Enable autonegotation on this interface
+ * Enable autonegotiation on this interface
*/
static void mii_set_auto_on(struct velocity_info *vptr)
{
@@ -879,7 +879,7 @@ static u32 check_connection_type(struct mac_regs __iomem *regs)
/**
- * velocity_set_media_mode - set media mode
+ * velocity_set_media_mode - set media mode
* @mii_status: old MII link state
*
* Check the media link state and configure the flow control
@@ -999,7 +999,7 @@ static int velocity_set_media_mode(struct velocity_info *vptr, u32 mii_status)
}
/**
- * velocity_print_link_status - link status reporting
+ * velocity_print_link_status - link status reporting
* @vptr: velocity to report on
*
* Turn the link status of the velocity card into a kernel log
@@ -1050,7 +1050,7 @@ static void velocity_print_link_status(struct velocity_info *vptr)
}
/**
- * enable_flow_control_ability - flow control
+ * enable_flow_control_ability - flow control
* @vptr: veloity to configure
*
* Set up flow control according to the flow control options
@@ -1102,7 +1102,7 @@ static void enable_flow_control_ability(struct velocity_info *vptr)
}
/**
- * velocity_soft_reset - soft reset
+ * velocity_soft_reset - soft reset
* @vptr: velocity to reset
*
* Kick off a soft reset of the velocity adapter and then poll
@@ -1131,7 +1131,7 @@ static int velocity_soft_reset(struct velocity_info *vptr)
}
/**
- * velocity_set_multi - filter list change callback
+ * velocity_set_multi - filter list change callback
* @dev: network device
*
* Called by the network layer when the filter lists need to change
@@ -1181,7 +1181,7 @@ static void velocity_set_multi(struct net_device *dev)
*/
/**
- * mii_init - set up MII
+ * mii_init - set up MII
* @vptr: velocity adapter
* @mii_status: links tatus
*
@@ -1250,7 +1250,7 @@ static void mii_init(struct velocity_info *vptr, u32 mii_status)
}
/**
- * setup_queue_timers - Setup interrupt timers
+ * setup_queue_timers - Setup interrupt timers
*
* Setup interrupt frequency during suppression (timeout if the frame
* count isn't filled).
@@ -1273,7 +1273,7 @@ static void setup_queue_timers(struct velocity_info *vptr)
}
}
/**
- * setup_adaptive_interrupts - Setup interrupt suppression
+ * setup_adaptive_interrupts - Setup interrupt suppression
*
* @vptr velocity adapter
*
@@ -1311,11 +1311,11 @@ static void setup_adaptive_interrupts(struct velocity_info *vptr)
}
/**
- * velocity_init_registers - initialise MAC registers
+ * velocity_init_registers - initialise MAC registers
* @vptr: velocity to init
* @type: type of initialisation (hot or cold)
*
- * Initialise the MAC on a reset or on first set up on the
+ * Initialize the MAC on a reset or on first set up on the
* hardware.
*/
static void velocity_init_registers(struct velocity_info *vptr,
@@ -1459,7 +1459,7 @@ static void velocity_give_many_rx_descs(struct velocity_info *vptr)
}
/**
- * velocity_init_dma_rings - set up DMA rings
+ * velocity_init_dma_rings - set up DMA rings
* @vptr: Velocity to set up
*
* Allocate PCI mapped DMA rings for the receive and transmit layer
@@ -1511,7 +1511,7 @@ static void velocity_set_rxbufsize(struct velocity_info *vptr, int mtu)
}
/**
- * velocity_alloc_rx_buf - allocate aligned receive buffer
+ * velocity_alloc_rx_buf - allocate aligned receive buffer
* @vptr: velocity
* @idx: ring index
*
@@ -1542,7 +1542,7 @@ static int velocity_alloc_rx_buf(struct velocity_info *vptr, int idx)
* Fill in the descriptor to match
*/
- *((u32 *) & (rd->rdesc0)) = 0;
+ *((u32 *) &(rd->rdesc0)) = 0;
rd->size = cpu_to_le16(vptr->rx.buf_sz) | RX_INTEN;
rd->pa_low = cpu_to_le32(rd_info->skb_dma);
rd->pa_high = 0;
@@ -1578,7 +1578,7 @@ static int velocity_rx_refill(struct velocity_info *vptr)
}
/**
- * velocity_free_rd_ring - free receive ring
+ * velocity_free_rd_ring - free receive ring
* @vptr: velocity to clean up
*
* Free the receive buffers for each ring slot and any
@@ -1614,7 +1614,7 @@ static void velocity_free_rd_ring(struct velocity_info *vptr)
/**
- * velocity_init_rd_ring - set up receive ring
+ * velocity_init_rd_ring - set up receive ring
* @vptr: velocity to configure
*
* Allocate and set up the receive buffers for each ring slot and
@@ -1644,7 +1644,7 @@ out:
}
/**
- * velocity_init_td_ring - set up transmit ring
+ * velocity_init_td_ring - set up transmit ring
* @vptr: velocity
*
* Set up the transmit ring and chain the ring pointers together.
@@ -1673,7 +1673,7 @@ static int velocity_init_td_ring(struct velocity_info *vptr)
}
/**
- * velocity_free_dma_rings - free PCI ring pointers
+ * velocity_free_dma_rings - free PCI ring pointers
* @vptr: Velocity to free from
*
* Clean up the PCI ring buffers allocated to this velocity.
@@ -1715,7 +1715,7 @@ err_free_dma_rings_0:
}
/**
- * velocity_free_tx_buf - free transmit buffer
+ * velocity_free_tx_buf - free transmit buffer
* @vptr: velocity
* @tdinfo: buffer
*
@@ -1776,7 +1776,7 @@ static void velocity_free_td_ring_entry(struct velocity_info *vptr,
}
/**
- * velocity_free_td_ring - free td ring
+ * velocity_free_td_ring - free td ring
* @vptr: velocity
*
* Free up the transmit ring for this particular velocity adapter.
@@ -1806,7 +1806,7 @@ static void velocity_free_rings(struct velocity_info *vptr)
}
/**
- * velocity_error - handle error from controller
+ * velocity_error - handle error from controller
* @vptr: velocity
* @status: card status
*
@@ -1895,7 +1895,7 @@ static void velocity_error(struct velocity_info *vptr, int status)
}
/**
- * tx_srv - transmit interrupt service
+ * tx_srv - transmit interrupt service
* @vptr; Velocity
*
* Scan the queues looking for transmitted packets that
@@ -1963,7 +1963,7 @@ static int velocity_tx_srv(struct velocity_info *vptr)
}
/**
- * velocity_rx_csum - checksum process
+ * velocity_rx_csum - checksum process
* @rd: receive packet descriptor
* @skb: network layer packet buffer
*
@@ -1987,14 +1987,14 @@ static inline void velocity_rx_csum(struct rx_desc *rd, struct sk_buff *skb)
}
/**
- * velocity_rx_copy - in place Rx copy for small packets
+ * velocity_rx_copy - in place Rx copy for small packets
* @rx_skb: network layer packet buffer candidate
* @pkt_size: received data size
* @rd: receive packet descriptor
* @dev: network device
*
* Replace the current skb that is scheduled for Rx processing by a
- * shorter, immediatly allocated skb, if the received packet is small
+ * shorter, immediately allocated skb, if the received packet is small
* enough. This function returns a negative value if the received
* packet is too big or if memory is exhausted.
*/
@@ -2018,7 +2018,7 @@ static int velocity_rx_copy(struct sk_buff **rx_skb, int pkt_size,
}
/**
- * velocity_iph_realign - IP header alignment
+ * velocity_iph_realign - IP header alignment
* @vptr: velocity we are handling
* @skb: network layer packet buffer
* @pkt_size: received data size
@@ -2037,7 +2037,7 @@ static inline void velocity_iph_realign(struct velocity_info *vptr,
/**
- * velocity_receive_frame - received packet processor
+ * velocity_receive_frame - received packet processor
* @vptr: velocity we are handling
* @idx: ring index
*
@@ -2107,7 +2107,7 @@ static int velocity_receive_frame(struct velocity_info *vptr, int idx)
/**
- * velocity_rx_srv - service RX interrupt
+ * velocity_rx_srv - service RX interrupt
* @vptr: velocity
*
* Walk the receive ring of the velocity adapter and remove
@@ -2191,7 +2191,7 @@ static int velocity_poll(struct napi_struct *napi, int budget)
}
/**
- * velocity_intr - interrupt callback
+ * velocity_intr - interrupt callback
* @irq: interrupt number
* @dev_instance: interrupting device
*
@@ -2232,14 +2232,14 @@ static irqreturn_t velocity_intr(int irq, void *dev_instance)
}
/**
- * velocity_open - interface activation callback
+ * velocity_open - interface activation callback
* @dev: network layer device to open
*
* Called when the network layer brings the interface up. Returns
* a negative posix error code on failure, or zero on success.
*
* All the ring allocation and set up is done on open for this
- * adapter to minimise memory usage when inactive
+ * adapter to minimize memory usage when inactive
*/
static int velocity_open(struct net_device *dev)
{
@@ -2275,7 +2275,7 @@ out:
}
/**
- * velocity_shutdown - shut down the chip
+ * velocity_shutdown - shut down the chip
* @vptr: velocity to deactivate
*
* Shuts down the internal operations of the velocity and
@@ -2293,7 +2293,7 @@ static void velocity_shutdown(struct velocity_info *vptr)
}
/**
- * velocity_change_mtu - MTU change callback
+ * velocity_change_mtu - MTU change callback
* @dev: network device
* @new_mtu: desired MTU
*
@@ -2374,7 +2374,7 @@ out_0:
}
/**
- * velocity_mii_ioctl - MII ioctl handler
+ * velocity_mii_ioctl - MII ioctl handler
* @dev: network device
* @ifr: the ifreq block for the ioctl
* @cmd: the command
@@ -2415,7 +2415,7 @@ static int velocity_mii_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd
/**
- * velocity_ioctl - ioctl entry point
+ * velocity_ioctl - ioctl entry point
* @dev: network device
* @rq: interface request ioctl
* @cmd: command code
@@ -2452,7 +2452,7 @@ static int velocity_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
}
/**
- * velocity_get_status - statistics callback
+ * velocity_get_status - statistics callback
* @dev: network device
*
* Callback from the network layer to allow driver statistics
@@ -2477,24 +2477,14 @@ static struct net_device_stats *velocity_get_stats(struct net_device *dev)
dev->stats.rx_errors = vptr->mib_counter[HW_MIB_ifRxErrorPkts];
dev->stats.rx_length_errors = vptr->mib_counter[HW_MIB_ifInRangeLengthErrors];
-// unsigned long rx_dropped; /* no space in linux buffers */
dev->stats.collisions = vptr->mib_counter[HW_MIB_ifTxEtherCollisions];
- /* detailed rx_errors: */
-// unsigned long rx_length_errors;
-// unsigned long rx_over_errors; /* receiver ring buff overflow */
dev->stats.rx_crc_errors = vptr->mib_counter[HW_MIB_ifRxPktCRCE];
-// unsigned long rx_frame_errors; /* recv'd frame alignment error */
-// unsigned long rx_fifo_errors; /* recv'r fifo overrun */
-// unsigned long rx_missed_errors; /* receiver missed packet */
-
- /* detailed tx_errors */
-// unsigned long tx_fifo_errors;
return &dev->stats;
}
/**
- * velocity_close - close adapter callback
+ * velocity_close - close adapter callback
* @dev: network device
*
* Callback from the network layer when the velocity is being
@@ -2523,11 +2513,11 @@ static int velocity_close(struct net_device *dev)
}
/**
- * velocity_xmit - transmit packet callback
+ * velocity_xmit - transmit packet callback
* @skb: buffer to transmit
* @dev: network device
*
- * Called by the networ layer to request a packet is queued to
+ * Called by the network layer to request a packet is queued to
* the velocity. Returns zero on success.
*/
static netdev_tx_t velocity_xmit(struct sk_buff *skb,
@@ -2636,7 +2626,7 @@ static const struct net_device_ops velocity_netdev_ops = {
.ndo_start_xmit = velocity_xmit,
.ndo_get_stats = velocity_get_stats,
.ndo_validate_addr = eth_validate_addr,
- .ndo_set_mac_address = eth_mac_addr,
+ .ndo_set_mac_address = eth_mac_addr,
.ndo_set_multicast_list = velocity_set_multi,
.ndo_change_mtu = velocity_change_mtu,
.ndo_do_ioctl = velocity_ioctl,
@@ -2646,7 +2636,7 @@ static const struct net_device_ops velocity_netdev_ops = {
};
/**
- * velocity_init_info - init private data
+ * velocity_init_info - init private data
* @pdev: PCI device
* @vptr: Velocity info
* @info: Board type
@@ -2668,7 +2658,7 @@ static void __devinit velocity_init_info(struct pci_dev *pdev,
}
/**
- * velocity_get_pci_info - retrieve PCI info for device
+ * velocity_get_pci_info - retrieve PCI info for device
* @vptr: velocity device
* @pdev: PCI device it matches
*
@@ -2706,7 +2696,7 @@ static int __devinit velocity_get_pci_info(struct velocity_info *vptr, struct pc
}
/**
- * velocity_print_info - per driver data
+ * velocity_print_info - per driver data
* @vptr: velocity
*
* Print per driver data as the kernel driver finds Velocity
@@ -2730,7 +2720,7 @@ static u32 velocity_get_link(struct net_device *dev)
/**
- * velocity_found1 - set up discovered velocity card
+ * velocity_found1 - set up discovered velocity card
* @pdev: PCI device
* @ent: PCI device table entry that matched
*
@@ -2877,7 +2867,7 @@ err_free_dev:
#ifdef CONFIG_PM
/**
- * wol_calc_crc - WOL CRC
+ * wol_calc_crc - WOL CRC
* @pattern: data pattern
* @mask_pattern: mask
*
@@ -2912,7 +2902,7 @@ static u16 wol_calc_crc(int size, u8 *pattern, u8 *mask_pattern)
}
/**
- * velocity_set_wol - set up for wake on lan
+ * velocity_set_wol - set up for wake on lan
* @vptr: velocity to set WOL status on
*
* Set a card up for wake on lan either by unicast or by
@@ -2957,7 +2947,7 @@ static int velocity_set_wol(struct velocity_info *vptr)
memcpy(arp->ar_tip, vptr->ip_addr, 4);
crc = wol_calc_crc((sizeof(struct arp_packet) + 7) / 8, buf,
- (u8 *) & mask_pattern[0][0]);
+ (u8 *)mask_pattern[0][0]);
writew(crc, ®s->PatternCRC[0]);
writew(WOLCR_ARP_EN, ®s->WOLCRSet);
@@ -2997,7 +2987,7 @@ static int velocity_set_wol(struct velocity_info *vptr)
}
/**
- * velocity_save_context - save registers
+ * velocity_save_context - save registers
* @vptr: velocity
* @context: buffer for stored context
*
@@ -3058,7 +3048,7 @@ static int velocity_suspend(struct pci_dev *pdev, pm_message_t state)
}
/**
- * velocity_restore_context - restore registers
+ * velocity_restore_context - restore registers
* @vptr: velocity
* @context: buffer for stored context
*
@@ -3133,19 +3123,19 @@ static int velocity_resume(struct pci_dev *pdev)
* uses this to handle all our card discover and plugging
*/
static struct pci_driver velocity_driver = {
- .name = VELOCITY_NAME,
- .id_table = velocity_id_table,
- .probe = velocity_found1,
- .remove = __devexit_p(velocity_remove1),
+ .name = VELOCITY_NAME,
+ .id_table = velocity_id_table,
+ .probe = velocity_found1,
+ .remove = __devexit_p(velocity_remove1),
#ifdef CONFIG_PM
- .suspend = velocity_suspend,
- .resume = velocity_resume,
+ .suspend = velocity_suspend,
+ .resume = velocity_resume,
#endif
};
/**
- * velocity_ethtool_up - pre hook for ethtool
+ * velocity_ethtool_up - pre hook for ethtool
* @dev: network device
*
* Called before an ethtool operation. We need to make sure the
@@ -3160,7 +3150,7 @@ static int velocity_ethtool_up(struct net_device *dev)
}
/**
- * velocity_ethtool_down - post hook for ethtool
+ * velocity_ethtool_down - post hook for ethtool
* @dev: network device
*
* Called after an ethtool operation. Restore the chip back to D3
@@ -3352,8 +3342,7 @@ static int get_pending_timer_val(int val)
int mult_bits = val >> 6;
int mult = 1;
- switch (mult_bits)
- {
+ switch (mult_bits) {
case 1:
mult = 4; break;
case 2:
@@ -3454,7 +3443,7 @@ static const struct ethtool_ops velocity_ethtool_ops = {
.set_wol = velocity_ethtool_set_wol,
.get_msglevel = velocity_get_msglevel,
.set_msglevel = velocity_set_msglevel,
- .set_sg = ethtool_op_set_sg,
+ .set_sg = ethtool_op_set_sg,
.get_link = velocity_get_link,
.get_coalesce = velocity_get_coalesce,
.set_coalesce = velocity_set_coalesce,
@@ -3480,7 +3469,7 @@ static int velocity_netdev_event(struct notifier_block *nb, unsigned long notifi
#if defined(CONFIG_PM) && defined(CONFIG_INET)
static struct notifier_block velocity_inetaddr_notifier = {
- .notifier_call = velocity_netdev_event,
+ .notifier_call = velocity_netdev_event,
};
static void velocity_register_notifier(void)
@@ -3501,7 +3490,7 @@ static void velocity_unregister_notifier(void)
#endif /* defined(CONFIG_PM) && defined(CONFIG_INET) */
/**
- * velocity_init_module - load time function
+ * velocity_init_module - load time function
*
* Called when the velocity module is loaded. The PCI driver
* is registered with the PCI layer, and in turn will call
@@ -3520,7 +3509,7 @@ static int __init velocity_init_module(void)
}
/**
- * velocity_cleanup - module unload
+ * velocity_cleanup - module unload
*
* When the velocity hardware is unloaded this function is called.
* It will clean up the notifiers and the unregister the PCI
--
1.7.3.2
^ permalink raw reply related
* [PATCH] cxgb3: fix crash due to manipulating queues before registration
From: Nishanth Aravamudan @ 2010-10-28 5:06 UTC (permalink / raw)
To: eric.dumazet
Cc: Nishanth Aravamudan, sonnyrao, Divy Le Ray, Dimitris Michailidis,
netdev, linux-kernel
In-Reply-To: <1288236563.2658.59.camel@edumazet-laptop>
Hi Eric,
Something like the following?:
Thanks,
Nish
Along the same lines as "cxgb4: fix crash due to manipulating queues
before registration" (8f6d9f40476895571df039b6f1f5230ec7faebad), before
commit "net: allocate tx queues in register_netdevice"
netif_tx_stop_all_queues and related functions could be used between
device allocation and registration but now only after registration.
cxgb4 has such a call before registration and crashes now. Move it
after register_netdev.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: eric.dumazet@gmail.com
Cc: sonnyrao@us.ibm.com
Cc: Divy Le Ray <divy@chelsio.com>
Cc: Dimitris Michailidis <dm@chelsio.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
drivers/net/cxgb3/cxgb3_main.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c
index 4e3c123..96c70a5 100644
--- a/drivers/net/cxgb3/cxgb3_main.c
+++ b/drivers/net/cxgb3/cxgb3_main.c
@@ -3301,7 +3301,6 @@ static int __devinit init_one(struct pci_dev *pdev,
pi->rx_offload = T3_RX_CSUM | T3_LRO;
pi->port_id = i;
netif_carrier_off(netdev);
- netif_tx_stop_all_queues(netdev);
netdev->irq = pdev->irq;
netdev->mem_start = mmio_start;
netdev->mem_end = mmio_start + mmio_len - 1;
@@ -3342,6 +3341,7 @@ static int __devinit init_one(struct pci_dev *pdev,
adapter->name = adapter->port[i]->name;
__set_bit(i, &adapter->registered_device_map);
+ netif_tx_stop_all_queues(adapter->port[i]);
}
}
if (!adapter->registered_device_map) {
--
1.7.1
^ permalink raw reply related
* [PATCH] cxgb3: Fix panic in free_tx_desc()
From: Krishna Kumar @ 2010-10-28 5:10 UTC (permalink / raw)
To: davem, divy; +Cc: netdev, Krishna Kumar
I got a few of these panics (on 2.6.36-rc7) when running high
number of netperf sessions:
BUG: unable to handle kernel paging request at 0000100000000000
IP: [<ffffffff813125f0>] skb_release_data+0xa0/0xd0
Oops: 0000 [#1] SMP
Pid: 2155, comm: vhost-2115 Not tainted 2.6.36-rc7-ORG #1 49Y6512 /System x3650 M2 -[7947AC1]-
RIP: 0010:[<ffffffff813125f0>] [<ffffffff813125f0>] skb_release_data+0xa0/0xd0
RSP: 0018:ffff880001803738 EFLAGS: 00010206
RAX: ffff880179b0fc00 RBX: ffff880178b441c0 RCX: 0000000000000000
RSP: 0018:ffff880001803738 EFLAGS: 00010206
RAX: ffff880179b0fc00 RBX: ffff880178b441c0 RCX: 0000000000000000
RDX: ffff880179b0fd40 RSI: 0000000000000000 RDI: 0000100000000000
RBP: ffff880001803748 R08: 0000000000000001 R09: ffff88017f117000
R10: ffff88017b990608 R11: ffff88017f117090 R12: ffff880178b441c0
R13: ffff88017f117090 R14: 0000000000000000 R15: ffff880178b441c0
FS: 0000000000000000(0000) GS:ffff880001800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000100000000000 CR3: 000000017ea64000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process vhost-2115 (pid: 2155, threadinfo ffff88017d872000, task ffff88017e954680)
Stack:
ffff880178b441c0 0000000000000007 ffff880001803768 ffffffff81312119
<0> 0000000000000000 0000000000000002 ffff880001803778 ffffffff813121f9
<0> ffff880001803818 ffffffffa012d14c ffffffffa02de076 ffff880001803700
Call Trace:
<IRQ>
[<ffffffff81312119>] __kfree_skb+0x19/0xa0
[<ffffffff813121f9>] kfree_skb+0x19/0x40
[<ffffffffa012d14c>] free_tx_desc+0x2fc/0x350 [cxgb3]
[<ffffffffa02de076>] ? vhost_poll_wakeup+0x16/0x20 [vhost_net]
[<ffffffffa01323db>] t3_eth_xmit+0x28b/0x380 [cxgb3]
[<ffffffff8131ce47>] dev_hard_start_xmit+0x377/0x5a0
[<ffffffff81335a4a>] sch_direct_xmit+0xfa/0x1d0
[<ffffffff8131d1a9>] dev_queue_xmit+0x139/0x450
[<ffffffff81326225>] neigh_resolve_output+0x125/0x340
[<ffffffff8135a77c>] ip_finish_output+0x14c/0x320
[<ffffffff8135a9fe>] ip_output+0xae/0xc0
[<ffffffff8135620f>] ip_forward_finish+0x3f/0x50
[<ffffffff8135641f>] ip_forward+0x1ff/0x400
[<ffffffff81354789>] ip_rcv_finish+0x119/0x3e0
[<ffffffff81354c7d>] ip_rcv+0x22d/0x300
[<ffffffff8131a95b>] __netif_receive_skb+0x29b/0x570
[<ffffffff8131ba70>] ? netif_receive_skb+0x0/0x80
[<ffffffff8131bae8>] netif_receive_skb+0x78/0x80
[<ffffffffa02a96d8>] br_handle_frame_finish+0x198/0x260 [bridge]
[<ffffffffa02aebc8>] br_nf_pre_routing_finish+0x238/0x380 [bridge]
[<ffffffff813424bc>] ? nf_hook_slow+0x6c/0x100
[<ffffffffa02ae990>] ? br_nf_pre_routing_finish+0x0/0x380 [bridge]
[<ffffffffa02afb08>] br_nf_pre_routing+0x698/0x7a0 [bridge]
[<ffffffff81342414>] nf_iterate+0x64/0xa0
[<ffffffffa02a9540>] ? br_handle_frame_finish+0x0/0x260 [bridge]
[<ffffffff813424bc>] nf_hook_slow+0x6c/0x100
[<ffffffffa02a9540>] ? br_handle_frame_finish+0x0/0x260 [bridge]
[<ffffffffa02a9931>] br_handle_frame+0x191/0x240 [bridge]
[<ffffffffa02a97a0>] ? br_handle_frame+0x0/0x240 [bridge]
[<ffffffff8131a863>] __netif_receive_skb+0x1a3/0x570
[<ffffffff812ef3f6>] ? dma_issue_pending_all+0x76/0xa0
[<ffffffff8131ad32>] process_backlog+0x102/0x200
[<ffffffff8131c2d0>] net_rx_action+0x100/0x220
[<ffffffff810548ef>] __do_softirq+0xaf/0x140
[<ffffffff8100bcdc>] call_softirq+0x1c/0x30
[<ffffffff8100dfc5>] ? do_softirq+0x65/0xa0
[<ffffffff8131c6b8>] netif_rx_ni+0x28/0x30
[<ffffffffa02c305d>] tun_sendmsg+0x2cd/0x4b0 [tun]
[<ffffffffa02e01af>] handle_tx+0x1df/0x340 [vhost_net]
[<ffffffffa02e0340>] handle_tx_kick+0x10/0x20 [vhost_net]
[<ffffffffa02de29b>] vhost_worker+0xbb/0x130 [vhost_net]
[<ffffffffa02de1e0>] ? vhost_worker+0x0/0x130 [vhost_net]
[<ffffffffa02de1e0>] ? vhost_worker+0x0/0x130 [vhost_net]
[<ffffffff81069686>] kthread+0x96/0xa0
[<ffffffff8100bbe4>] kernel_thread_helper+0x4/0x10
[<ffffffff810695f0>] ? kthread+0x0/0xa0
[<ffffffff8100bbe0>] ? kernel_thread_helper+0x0/0x10
Code: 8b 94 24 d0 00 00 00 49 8b 84 24 d8 00 00 00 48 8d 14 10 0f b7 0a 39 d9 7f d1 48 8b 7a 10 48 85 ff 74 20 48 c7 42 10 00 00 00 00 <48> 8b 1f e8 e8 fb ff ff 48 85 db 48 89 df 75 f0 49 8b 84 24 d8
Patch below fixes the panic. cxgb4 and cxgb4vf already have this fix.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
drivers/net/cxgb3/sge.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff -ruNp org/drivers/net/cxgb3/sge.c new/drivers/net/cxgb3/sge.c
--- org/drivers/net/cxgb3/sge.c 2010-10-25 09:10:47.000000000 +0530
+++ new/drivers/net/cxgb3/sge.c 2010-10-28 10:22:47.000000000 +0530
@@ -296,8 +296,10 @@ static void free_tx_desc(struct adapter
if (d->skb) { /* an SGL is present */
if (need_unmap)
unmap_skb(d->skb, q, cidx, pdev);
- if (d->eop)
+ if (d->eop) {
kfree_skb(d->skb);
+ d->skb = NULL;
+ }
}
++d;
if (++cidx == q->size) {
^ permalink raw reply
* [PATCH] virtio_net: Fix queue full check
From: Krishna Kumar @ 2010-10-28 5:10 UTC (permalink / raw)
To: rusty, davem; +Cc: netdev, Krishna Kumar
I get many queue full errors being wrongly reported when running
parallel netperfs:
Oct 17 10:22:40 localhost kernel: net eth0: Unexpected TX queue failure: -28
Oct 17 10:28:22 localhost kernel: net eth0: Unexpected TX queue failure: -28
Oct 17 10:35:58 localhost kernel: net eth0: Unexpected TX queue failure: -28
Oct 17 10:41:06 localhost kernel: net eth0: Unexpected TX queue failure: -28
I initially changed the check from -ENOMEM to -ENOSPC, but
virtqueue_add_buf can return only -ENOSPC when it doesn't have
space for new request. Patch removes redundant checks but
displays the failure errno.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
drivers/net/virtio_net.c | 15 ++++-----------
1 file changed, 4 insertions(+), 11 deletions(-)
diff -ruNp org/drivers/net/virtio_net.c new/drivers/net/virtio_net.c
--- org/drivers/net/virtio_net.c 2010-10-11 10:20:02.000000000 +0530
+++ new/drivers/net/virtio_net.c 2010-10-21 17:37:45.000000000 +0530
@@ -570,17 +570,10 @@ static netdev_tx_t start_xmit(struct sk_
/* This can happen with OOM and indirect buffers. */
if (unlikely(capacity < 0)) {
- if (net_ratelimit()) {
- if (likely(capacity == -ENOMEM)) {
- dev_warn(&dev->dev,
- "TX queue failure: out of memory\n");
- } else {
- dev->stats.tx_fifo_errors++;
- dev_warn(&dev->dev,
- "Unexpected TX queue failure: %d\n",
- capacity);
- }
- }
+ if (net_ratelimit())
+ dev_warn(&dev->dev,
+ "TX queue failure (%d): out of memory\n",
+ capacity);
dev->stats.tx_dropped++;
kfree_skb(skb);
return NETDEV_TX_OK;
^ permalink raw reply
* Re: [v3 RFC PATCH 0/4] Implement multiqueue virtio-net
From: Krishna Kumar2 @ 2010-10-28 5:14 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: anthony, arnd, avi, davem, eric.dumazet, kvm, netdev, rusty
In-Reply-To: <20101026110913.GC7922@redhat.com>
"Michael S. Tsirkin" <mst@redhat.com> wrote on 10/26/2010 04:39:13 PM:
(merging two posts into one)
> I think we discussed the need for external to guest testing
> over 10G. For large messages we should not see any change
> but you should be able to get better numbers for small messages
> assuming a MQ NIC card.
For external host, there is a contention among different
queues (vhosts) when packets are processed in tun/bridge,
unless I implement MQ TX for macvtap (tun/bridge?). So
my testing shows a small improvement (1 to 1.5% average)
in BW and a rise in SD (between 10-15%). For remote host,
I think tun/macvtap needs MQ TX support?
> > > > Results for UDP BW tests (unidirectional, sum across
> > > > 3 iterations, each iteration of 45 seconds, default
> > > > netperf, vhosts bound to cpus 0-3; no other tuning):
> > >
> > > Is binding vhost threads to CPUs really required?
> > > What happens if we let the scheduler do its job?
> >
> > Nothing drastic, I remember BW% and SD% both improved a
> > bit as a result of binding.
>
> If there's a significant improvement this would mean that
> we need to rethink the vhost-net interaction with the scheduler.
I will get a test run with and without binding and post the
results later today.
Thanks,
- KK
^ permalink raw reply
* net-next-2.6 [PATCH 0/4] dccp: CCID packet dequeueing interface
From: Gerrit Renker @ 2010-10-28 5:16 UTC (permalink / raw)
To: davem; +Cc: dccp, netdev
In-Reply-To: <dccp_packet_dequeueing_interface_set_of_4>
Dave,
can you please consider the following set of 4, which solves several problems
of the CCID packet-dequeueing interface. This set has been tested for about
3 years.
Patch #1: sets up a return code convention to handle both rate-based and
window-based congestion-control IDs (CCIDs) with one function.
Patch #2: generalizes the xmit interface: rate-based CCIDs continue to use
timers, whereas window-based ones use a tasklet.
Patch #3: fixes issues in cleaning up CCID state when closing a connection,
like patch #2 this also divides into rate-based/window-based.
Patch #4: using the previous patches, takes CCID-2 out of its polling mode
(that was the problem of the xmit interface - CCID-2 kept on
abusing it by faking a small timeout, constantly polling until
its internal congestion state allowed it to send again).
The set has also been placed into a fresh (today's) copy of net-next-2.6, on
git://eden-feed.erg.abdn.ac.uk/net-next-2.6 [subtree 'dccp']
The set is fully bisectable.
---
include/linux/dccp.h | 4 +-
net/dccp/ccid.h | 34 +++++++-
net/dccp/ccids/ccid2.c | 23 ++++--
net/dccp/ccids/ccid2.h | 5 +
net/dccp/ccids/ccid3.c | 12 ++--
net/dccp/dccp.h | 5 +-
net/dccp/output.c | 209 +++++++++++++++++++++++++++++-------------------
net/dccp/proto.c | 21 +++++-
net/dccp/timer.c | 27 ++++---
9 files changed, 223 insertions(+), 117 deletions(-)
^ permalink raw reply
* [PATCH 4/4] dccp ccid-2: Stop polling
From: Gerrit Renker @ 2010-10-28 5:16 UTC (permalink / raw)
To: davem; +Cc: dccp, netdev, Gerrit Renker
In-Reply-To: <1288242988-5333-4-git-send-email-gerrit@erg.abdn.ac.uk>
This updates CCID-2 to use the CCID dequeuing mechanism, converting from
previous continuous-polling to a now event-driven mechanism.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
net/dccp/ccids/ccid2.c | 23 +++++++++++++++--------
net/dccp/ccids/ccid2.h | 5 +++++
2 files changed, 20 insertions(+), 8 deletions(-)
--- a/net/dccp/ccids/ccid2.h
+++ b/net/dccp/ccids/ccid2.h
@@ -81,6 +81,11 @@ struct ccid2_hc_tx_sock {
u64 tx_high_ack;
};
+static inline bool ccid2_cwnd_network_limited(struct ccid2_hc_tx_sock *hc)
+{
+ return hc->tx_pipe >= hc->tx_cwnd;
+}
+
struct ccid2_hc_rx_sock {
int rx_data;
};
--- a/net/dccp/ccids/ccid2.c
+++ b/net/dccp/ccids/ccid2.c
@@ -78,12 +78,9 @@ static int ccid2_hc_tx_alloc_seq(struct ccid2_hc_tx_sock *hc)
static int ccid2_hc_tx_send_packet(struct sock *sk, struct sk_buff *skb)
{
- struct ccid2_hc_tx_sock *hc = ccid2_hc_tx_sk(sk);
-
- if (hc->tx_pipe < hc->tx_cwnd)
- return 0;
-
- return 1; /* XXX CCID should dequeue when ready instead of polling */
+ if (ccid2_cwnd_network_limited(ccid2_hc_tx_sk(sk)))
+ return CCID_PACKET_WILL_DEQUEUE_LATER;
+ return CCID_PACKET_SEND_AT_ONCE;
}
static void ccid2_change_l_ack_ratio(struct sock *sk, u32 val)
@@ -115,6 +112,7 @@ static void ccid2_hc_tx_rto_expire(unsigned long data)
{
struct sock *sk = (struct sock *)data;
struct ccid2_hc_tx_sock *hc = ccid2_hc_tx_sk(sk);
+ const bool sender_was_blocked = ccid2_cwnd_network_limited(hc);
bh_lock_sock(sk);
if (sock_owned_by_user(sk)) {
@@ -129,8 +127,6 @@ static void ccid2_hc_tx_rto_expire(unsigned long data)
if (hc->tx_rto > DCCP_RTO_MAX)
hc->tx_rto = DCCP_RTO_MAX;
- sk_reset_timer(sk, &hc->tx_rtotimer, jiffies + hc->tx_rto);
-
/* adjust pipe, cwnd etc */
hc->tx_ssthresh = hc->tx_cwnd / 2;
if (hc->tx_ssthresh < 2)
@@ -146,6 +142,12 @@ static void ccid2_hc_tx_rto_expire(unsigned long data)
hc->tx_rpseq = 0;
hc->tx_rpdupack = -1;
ccid2_change_l_ack_ratio(sk, 1);
+
+ /* if we were blocked before, we may now send cwnd=1 packet */
+ if (sender_was_blocked)
+ tasklet_schedule(&dccp_sk(sk)->dccps_xmitlet);
+ /* restart backed-off timer */
+ sk_reset_timer(sk, &hc->tx_rtotimer, jiffies + hc->tx_rto);
out:
bh_unlock_sock(sk);
sock_put(sk);
@@ -434,6 +436,7 @@ static void ccid2_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
{
struct dccp_sock *dp = dccp_sk(sk);
struct ccid2_hc_tx_sock *hc = ccid2_hc_tx_sk(sk);
+ const bool sender_was_blocked = ccid2_cwnd_network_limited(hc);
u64 ackno, seqno;
struct ccid2_seq *seqp;
unsigned char *vector;
@@ -631,6 +634,10 @@ static void ccid2_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
sk_stop_timer(sk, &hc->tx_rtotimer);
else
sk_reset_timer(sk, &hc->tx_rtotimer, jiffies + hc->tx_rto);
+
+ /* check if incoming Acks allow pending packets to be sent */
+ if (sender_was_blocked && !ccid2_cwnd_network_limited(hc))
+ tasklet_schedule(&dccp_sk(sk)->dccps_xmitlet);
}
static int ccid2_hc_tx_init(struct ccid *ccid, struct sock *sk)
^ permalink raw reply
* [PATCH 1/4] dccp: Return-value convention of hc_tx_send_packet()
From: Gerrit Renker @ 2010-10-28 5:16 UTC (permalink / raw)
To: davem; +Cc: dccp, netdev, Gerrit Renker
In-Reply-To: <1288242988-5333-1-git-send-email-gerrit@erg.abdn.ac.uk>
This patch reorganises the return value convention of the CCID TX sending
function, to permit more flexible schemes, as required by subsequent patches.
Currently the convention is
* values < 0 mean error,
* a value == 0 means "send now", and
* a value x > 0 means "send in x milliseconds".
The patch provides symbolic constants and a function to interpret return values.
In addition, it caps the maximum positive return value to 0xFFFF milliseconds,
corresponding to 65.535 seconds. This is possible since in CCID-3/4 the
maximum possible inter-packet gap is fixed at t_mbi = 64 sec.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
net/dccp/ccid.h | 34 +++++++++++++++++++++++++++++++---
net/dccp/ccids/ccid3.c | 12 ++++++------
2 files changed, 37 insertions(+), 9 deletions(-)
--- a/net/dccp/ccid.h
+++ b/net/dccp/ccid.h
@@ -134,13 +134,41 @@ static inline int ccid_get_current_tx_ccid(struct dccp_sock *dp)
extern void ccid_hc_rx_delete(struct ccid *ccid, struct sock *sk);
extern void ccid_hc_tx_delete(struct ccid *ccid, struct sock *sk);
+/*
+ * Congestion control of queued data packets via CCID decision.
+ *
+ * The TX CCID performs its congestion-control by indicating whether and when a
+ * queued packet may be sent, using the return code of ccid_hc_tx_send_packet().
+ * The following modes are supported via the symbolic constants below:
+ * - timer-based pacing (CCID returns a delay value in milliseconds);
+ * - autonomous dequeueing (CCID internally schedules dccps_xmitlet).
+ */
+
+enum ccid_dequeueing_decision {
+ CCID_PACKET_SEND_AT_ONCE = 0x00000, /* "green light": no delay */
+ CCID_PACKET_DELAY_MAX = 0x0FFFF, /* maximum delay in msecs */
+ CCID_PACKET_DELAY = 0x10000, /* CCID msec-delay mode */
+ CCID_PACKET_WILL_DEQUEUE_LATER = 0x20000, /* CCID autonomous mode */
+ CCID_PACKET_ERR = 0xF0000, /* error condition */
+};
+
+static inline int ccid_packet_dequeue_eval(const int return_code)
+{
+ if (return_code < 0)
+ return CCID_PACKET_ERR;
+ if (return_code == 0)
+ return CCID_PACKET_SEND_AT_ONCE;
+ if (return_code <= CCID_PACKET_DELAY_MAX)
+ return CCID_PACKET_DELAY;
+ return return_code;
+}
+
static inline int ccid_hc_tx_send_packet(struct ccid *ccid, struct sock *sk,
struct sk_buff *skb)
{
- int rc = 0;
if (ccid->ccid_ops->ccid_hc_tx_send_packet != NULL)
- rc = ccid->ccid_ops->ccid_hc_tx_send_packet(sk, skb);
- return rc;
+ return ccid->ccid_ops->ccid_hc_tx_send_packet(sk, skb);
+ return CCID_PACKET_SEND_AT_ONCE;
}
static inline void ccid_hc_tx_packet_sent(struct ccid *ccid, struct sock *sk,
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -268,11 +268,11 @@ out:
sock_put(sk);
}
-/*
- * returns
- * > 0: delay (in msecs) that should pass before actually sending
- * = 0: can send immediately
- * < 0: error condition; do not send packet
+/**
+ * ccid3_hc_tx_send_packet - Delay-based dequeueing of TX packets
+ * @skb: next packet candidate to send on @sk
+ * This function uses the convention of ccid_packet_dequeue_eval() and
+ * returns a millisecond-delay value between 0 and t_mbi = 64000 msec.
*/
static int ccid3_hc_tx_send_packet(struct sock *sk, struct sk_buff *skb)
{
@@ -348,7 +348,7 @@ static int ccid3_hc_tx_send_packet(struct sock *sk, struct sk_buff *skb)
/* set the nominal send time for the next following packet */
hc->tx_t_nom = ktime_add_us(hc->tx_t_nom, hc->tx_t_ipi);
- return 0;
+ return CCID_PACKET_SEND_AT_ONCE;
}
static void ccid3_hc_tx_packet_sent(struct sock *sk, unsigned int len)
^ permalink raw reply
* [PATCH 2/4] dccp: Extend CCID packet dequeueing interface
From: Gerrit Renker @ 2010-10-28 5:16 UTC (permalink / raw)
To: davem; +Cc: dccp, netdev, Gerrit Renker
In-Reply-To: <1288242988-5333-2-git-send-email-gerrit@erg.abdn.ac.uk>
This extends the packet dequeuing interface of dccp_write_xmit() to allow
1. CCIDs to take care of timing when the next packet may be sent;
2. delayed sending (as before, with an inter-packet gap up to 65.535 seconds).
The main purpose is to take CCID-2 out of its polling mode (when it is network-
limited, it tries every millisecond to send, without interruption).
The mode of operation for (2) is as follows:
* new packet is enqueued via dccp_sendmsg() => dccp_write_xmit(),
* ccid_hc_tx_send_packet() detects that it may not send (e.g. window full),
* it signals this condition via `CCID_PACKET_WILL_DEQUEUE_LATER',
* dccp_write_xmit() returns without further action;
* after some time the wait-condition for CCID becomes true,
* that CCID schedules the tasklet,
* tasklet function calls ccid_hc_tx_send_packet() via dccp_write_xmit(),
* since the wait-condition is now true, ccid_hc_tx_packet() returns "send now",
* packet is sent, and possibly more (since dccp_write_xmit() loops).
Code reuse: the taskled function calls dccp_write_xmit(), the timer function
reduces to a wrapper around the same code.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
include/linux/dccp.h | 4 +-
net/dccp/output.c | 118 ++++++++++++++++++++++++++++++-------------------
net/dccp/timer.c | 25 ++++++-----
3 files changed, 89 insertions(+), 58 deletions(-)
--- a/include/linux/dccp.h
+++ b/include/linux/dccp.h
@@ -462,7 +462,8 @@ struct dccp_ackvec;
* @dccps_hc_rx_insert_options - receiver wants to add options when acking
* @dccps_hc_tx_insert_options - sender wants to add options when sending
* @dccps_server_timewait - server holds timewait state on close (RFC 4340, 8.3)
- * @dccps_xmit_timer - timer for when CCID is not ready to send
+ * @dccps_xmitlet - tasklet scheduled by the TX CCID to dequeue data packets
+ * @dccps_xmit_timer - used by the TX CCID to delay sending (rate-based pacing)
* @dccps_syn_rtt - RTT sample from Request/Response exchange (in usecs)
*/
struct dccp_sock {
@@ -502,6 +503,7 @@ struct dccp_sock {
__u8 dccps_hc_rx_insert_options:1;
__u8 dccps_hc_tx_insert_options:1;
__u8 dccps_server_timewait:1;
+ struct tasklet_struct dccps_xmitlet;
struct timer_list dccps_xmit_timer;
};
--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -254,63 +254,89 @@ do_interrupted:
goto out;
}
+/**
+ * dccp_xmit_packet - Send data packet under control of CCID
+ * Transmits next-queued payload and informs CCID to account for the packet.
+ */
+static void dccp_xmit_packet(struct sock *sk)
+{
+ int err, len;
+ struct dccp_sock *dp = dccp_sk(sk);
+ struct sk_buff *skb = skb_dequeue(&sk->sk_write_queue);
+
+ if (unlikely(skb == NULL))
+ return;
+ len = skb->len;
+
+ if (sk->sk_state == DCCP_PARTOPEN) {
+ const u32 cur_mps = dp->dccps_mss_cache - DCCP_FEATNEG_OVERHEAD;
+ /*
+ * See 8.1.5 - Handshake Completion.
+ *
+ * For robustness we resend Confirm options until the client has
+ * entered OPEN. During the initial feature negotiation, the MPS
+ * is smaller than usual, reduced by the Change/Confirm options.
+ */
+ if (!list_empty(&dp->dccps_featneg) && len > cur_mps) {
+ DCCP_WARN("Payload too large (%d) for featneg.\n", len);
+ dccp_send_ack(sk);
+ dccp_feat_list_purge(&dp->dccps_featneg);
+ }
+
+ inet_csk_schedule_ack(sk);
+ inet_csk_reset_xmit_timer(sk, ICSK_TIME_DACK,
+ inet_csk(sk)->icsk_rto,
+ DCCP_RTO_MAX);
+ DCCP_SKB_CB(skb)->dccpd_type = DCCP_PKT_DATAACK;
+ } else if (dccp_ack_pending(sk)) {
+ DCCP_SKB_CB(skb)->dccpd_type = DCCP_PKT_DATAACK;
+ } else {
+ DCCP_SKB_CB(skb)->dccpd_type = DCCP_PKT_DATA;
+ }
+
+ err = dccp_transmit_skb(sk, skb);
+ if (err)
+ dccp_pr_debug("transmit_skb() returned err=%d\n", err);
+ /*
+ * Register this one as sent even if an error occurred. To the remote
+ * end a local packet drop is indistinguishable from network loss, i.e.
+ * any local drop will eventually be reported via receiver feedback.
+ */
+ ccid_hc_tx_packet_sent(dp->dccps_hc_tx_ccid, sk, len);
+}
+
void dccp_write_xmit(struct sock *sk, int block)
{
struct dccp_sock *dp = dccp_sk(sk);
struct sk_buff *skb;
while ((skb = skb_peek(&sk->sk_write_queue))) {
- int err = ccid_hc_tx_send_packet(dp->dccps_hc_tx_ccid, sk, skb);
+ int rc = ccid_hc_tx_send_packet(dp->dccps_hc_tx_ccid, sk, skb);
- if (err > 0) {
+ switch (ccid_packet_dequeue_eval(rc)) {
+ case CCID_PACKET_WILL_DEQUEUE_LATER:
+ return;
+ case CCID_PACKET_DELAY:
if (!block) {
sk_reset_timer(sk, &dp->dccps_xmit_timer,
- msecs_to_jiffies(err)+jiffies);
+ msecs_to_jiffies(rc)+jiffies);
+ return;
+ }
+ rc = dccp_wait_for_ccid(sk, skb, rc);
+ if (rc && rc != -EINTR) {
+ DCCP_BUG("err=%d after dccp_wait_for_ccid", rc);
+ skb_dequeue(&sk->sk_write_queue);
+ kfree_skb(skb);
break;
- } else
- err = dccp_wait_for_ccid(sk, skb, err);
- if (err && err != -EINTR)
- DCCP_BUG("err=%d after dccp_wait_for_ccid", err);
- }
-
- skb_dequeue(&sk->sk_write_queue);
- if (err == 0) {
- struct dccp_skb_cb *dcb = DCCP_SKB_CB(skb);
- const int len = skb->len;
-
- if (sk->sk_state == DCCP_PARTOPEN) {
- const u32 cur_mps = dp->dccps_mss_cache - DCCP_FEATNEG_OVERHEAD;
- /*
- * See 8.1.5 - Handshake Completion.
- *
- * For robustness we resend Confirm options until the client has
- * entered OPEN. During the initial feature negotiation, the MPS
- * is smaller than usual, reduced by the Change/Confirm options.
- */
- if (!list_empty(&dp->dccps_featneg) && len > cur_mps) {
- DCCP_WARN("Payload too large (%d) for featneg.\n", len);
- dccp_send_ack(sk);
- dccp_feat_list_purge(&dp->dccps_featneg);
- }
-
- inet_csk_schedule_ack(sk);
- inet_csk_reset_xmit_timer(sk, ICSK_TIME_DACK,
- inet_csk(sk)->icsk_rto,
- DCCP_RTO_MAX);
- dcb->dccpd_type = DCCP_PKT_DATAACK;
- } else if (dccp_ack_pending(sk))
- dcb->dccpd_type = DCCP_PKT_DATAACK;
- else
- dcb->dccpd_type = DCCP_PKT_DATA;
-
- err = dccp_transmit_skb(sk, skb);
- ccid_hc_tx_packet_sent(dp->dccps_hc_tx_ccid, sk, len);
- if (err)
- DCCP_BUG("err=%d after ccid_hc_tx_packet_sent",
- err);
- } else {
- dccp_pr_debug("packet discarded due to err=%d\n", err);
+ }
+ /* fall through */
+ case CCID_PACKET_SEND_AT_ONCE:
+ dccp_xmit_packet(sk);
+ break;
+ case CCID_PACKET_ERR:
+ skb_dequeue(&sk->sk_write_queue);
kfree_skb(skb);
+ dccp_pr_debug("packet discarded due to err=%d\n", rc);
}
}
}
--- a/net/dccp/timer.c
+++ b/net/dccp/timer.c
@@ -237,32 +237,35 @@ out:
sock_put(sk);
}
-/* Transmit-delay timer: used by the CCIDs to delay actual send time */
-static void dccp_write_xmit_timer(unsigned long data)
+/**
+ * dccp_write_xmitlet - Workhorse for CCID packet dequeueing interface
+ * See the comments above %ccid_dequeueing_decision for supported modes.
+ */
+static void dccp_write_xmitlet(unsigned long data)
{
struct sock *sk = (struct sock *)data;
- struct dccp_sock *dp = dccp_sk(sk);
bh_lock_sock(sk);
if (sock_owned_by_user(sk))
- sk_reset_timer(sk, &dp->dccps_xmit_timer, jiffies+1);
+ sk_reset_timer(sk, &dccp_sk(sk)->dccps_xmit_timer, jiffies + 1);
else
dccp_write_xmit(sk, 0);
bh_unlock_sock(sk);
- sock_put(sk);
}
-static void dccp_init_write_xmit_timer(struct sock *sk)
+static void dccp_write_xmit_timer(unsigned long data)
{
- struct dccp_sock *dp = dccp_sk(sk);
-
- setup_timer(&dp->dccps_xmit_timer, dccp_write_xmit_timer,
- (unsigned long)sk);
+ dccp_write_xmitlet(data);
+ sock_put((struct sock *)data);
}
void dccp_init_xmit_timers(struct sock *sk)
{
- dccp_init_write_xmit_timer(sk);
+ struct dccp_sock *dp = dccp_sk(sk);
+
+ tasklet_init(&dp->dccps_xmitlet, dccp_write_xmitlet, (unsigned long)sk);
+ setup_timer(&dp->dccps_xmit_timer, dccp_write_xmit_timer,
+ (unsigned long)sk);
inet_csk_init_xmit_timers(sk, &dccp_write_timer, &dccp_delack_timer,
&dccp_keepalive_timer);
}
^ permalink raw reply
* [PATCH 3/4] dccp: Refine the wait-for-ccid mechanism
From: Gerrit Renker @ 2010-10-28 5:16 UTC (permalink / raw)
To: davem; +Cc: dccp, netdev, Gerrit Renker
In-Reply-To: <1288242988-5333-3-git-send-email-gerrit@erg.abdn.ac.uk>
This extends the existing wait-for-ccid routine so that it may be used with
different types of CCID, addressing the following problems:
1) The queue-drain mechanism only works with rate-based CCIDs. If CCID-2 for
example has a full TX queue and becomes network-limited just as the
application wants to close, then waiting for CCID-2 to become unblocked
could lead to an indefinite delay (i.e., application "hangs").
2) Since each TX CCID in turn uses a feedback mechanism, there may be changes
in its sending policy while the queue is being drained. This can lead to
further delays during which the application will not be able to terminate.
3) The minimum wait time for CCID-3/4 can be expected to be the queue length
times the current inter-packet delay. For example if tx_qlen=100 and a delay
of 15 ms is used for each packet, then the application would have to wait
for a minimum of 1.5 seconds before being allowed to exit.
4) There is no way for the user/application to control this behaviour. It would
be good to use the timeout argument of dccp_close() as an upper bound. Then
the maximum time that an application is willing to wait for its CCIDs to can
be set via the SO_LINGER option.
These problems are addressed by giving the CCID a grace period of up to the
`timeout' value.
The wait-for-ccid function is, as before, used when the application
(a) has read all the data in its receive buffer and
(b) if SO_LINGER was set with a non-zero linger time, or
(c) the socket is either in the OPEN (active close) or in the PASSIVE_CLOSEREQ
state (client application closes after receiving CloseReq).
In addition, there is a catch-all case of __skb_queue_purge() after waiting for
the CCID. This is necessary since the write queue may still have data when
(a) the host has been passively-closed,
(b) abnormal termination (unread data, zero linger time),
(c) wait-for-ccid could not finish within the given time limit.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
net/dccp/dccp.h | 5 +-
net/dccp/output.c | 115 ++++++++++++++++++++++++++++++-----------------------
net/dccp/proto.c | 21 +++++++++-
net/dccp/timer.c | 2 +-
4 files changed, 89 insertions(+), 54 deletions(-)
--- a/net/dccp/dccp.h
+++ b/net/dccp/dccp.h
@@ -243,8 +243,9 @@ extern void dccp_reqsk_send_ack(struct sock *sk, struct sk_buff *skb,
extern void dccp_send_sync(struct sock *sk, const u64 seq,
const enum dccp_pkt_type pkt_type);
-extern void dccp_write_xmit(struct sock *sk, int block);
-extern void dccp_write_space(struct sock *sk);
+extern void dccp_write_xmit(struct sock *sk);
+extern void dccp_write_space(struct sock *sk);
+extern void dccp_flush_write_queue(struct sock *sk, long *time_budget);
extern void dccp_init_xmit_timers(struct sock *sk);
static inline void dccp_clear_xmit_timers(struct sock *sk)
--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -209,49 +209,29 @@ void dccp_write_space(struct sock *sk)
}
/**
- * dccp_wait_for_ccid - Wait for ccid to tell us we can send a packet
+ * dccp_wait_for_ccid - Await CCID send permission
* @sk: socket to wait for
- * @skb: current skb to pass on for waiting
- * @delay: sleep timeout in milliseconds (> 0)
- * This function is called by default when the socket is closed, and
- * when a non-zero linger time is set on the socket. For consistency
+ * @delay: timeout in jiffies
+ * This is used by CCIDs which need to delay the send time in process context.
*/
-static int dccp_wait_for_ccid(struct sock *sk, struct sk_buff *skb, int delay)
+static int dccp_wait_for_ccid(struct sock *sk, unsigned long delay)
{
- struct dccp_sock *dp = dccp_sk(sk);
DEFINE_WAIT(wait);
- unsigned long jiffdelay;
- int rc;
-
- do {
- dccp_pr_debug("delayed send by %d msec\n", delay);
- jiffdelay = msecs_to_jiffies(delay);
-
- prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
+ long remaining;
- sk->sk_write_pending++;
- release_sock(sk);
- schedule_timeout(jiffdelay);
- lock_sock(sk);
- sk->sk_write_pending--;
+ prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
+ sk->sk_write_pending++;
+ release_sock(sk);
- if (sk->sk_err)
- goto do_error;
- if (signal_pending(current))
- goto do_interrupted;
+ remaining = schedule_timeout(delay);
- rc = ccid_hc_tx_send_packet(dp->dccps_hc_tx_ccid, sk, skb);
- } while ((delay = rc) > 0);
-out:
+ lock_sock(sk);
+ sk->sk_write_pending--;
finish_wait(sk_sleep(sk), &wait);
- return rc;
-
-do_error:
- rc = -EPIPE;
- goto out;
-do_interrupted:
- rc = -EINTR;
- goto out;
+
+ if (signal_pending(current) || sk->sk_err)
+ return -1;
+ return remaining;
}
/**
@@ -305,7 +285,53 @@ static void dccp_xmit_packet(struct sock *sk)
ccid_hc_tx_packet_sent(dp->dccps_hc_tx_ccid, sk, len);
}
-void dccp_write_xmit(struct sock *sk, int block)
+/**
+ * dccp_flush_write_queue - Drain queue at end of connection
+ * Since dccp_sendmsg queues packets without waiting for them to be sent, it may
+ * happen that the TX queue is not empty at the end of a connection. We give the
+ * HC-sender CCID a grace period of up to @time_budget jiffies. If this function
+ * returns with a non-empty write queue, it will be purged later.
+ */
+void dccp_flush_write_queue(struct sock *sk, long *time_budget)
+{
+ struct dccp_sock *dp = dccp_sk(sk);
+ struct sk_buff *skb;
+ long delay, rc;
+
+ while (*time_budget > 0 && (skb = skb_peek(&sk->sk_write_queue))) {
+ rc = ccid_hc_tx_send_packet(dp->dccps_hc_tx_ccid, sk, skb);
+
+ switch (ccid_packet_dequeue_eval(rc)) {
+ case CCID_PACKET_WILL_DEQUEUE_LATER:
+ /*
+ * If the CCID determines when to send, the next sending
+ * time is unknown or the CCID may not even send again
+ * (e.g. remote host crashes or lost Ack packets).
+ */
+ DCCP_WARN("CCID did not manage to send all packets\n");
+ return;
+ case CCID_PACKET_DELAY:
+ delay = msecs_to_jiffies(rc);
+ if (delay > *time_budget)
+ return;
+ rc = dccp_wait_for_ccid(sk, delay);
+ if (rc < 0)
+ return;
+ *time_budget -= (delay - rc);
+ /* check again if we can send now */
+ break;
+ case CCID_PACKET_SEND_AT_ONCE:
+ dccp_xmit_packet(sk);
+ break;
+ case CCID_PACKET_ERR:
+ skb_dequeue(&sk->sk_write_queue);
+ kfree_skb(skb);
+ dccp_pr_debug("packet discarded due to err=%ld\n", rc);
+ }
+ }
+}
+
+void dccp_write_xmit(struct sock *sk)
{
struct dccp_sock *dp = dccp_sk(sk);
struct sk_buff *skb;
@@ -317,19 +343,9 @@ void dccp_write_xmit(struct sock *sk, int block)
case CCID_PACKET_WILL_DEQUEUE_LATER:
return;
case CCID_PACKET_DELAY:
- if (!block) {
- sk_reset_timer(sk, &dp->dccps_xmit_timer,
- msecs_to_jiffies(rc)+jiffies);
- return;
- }
- rc = dccp_wait_for_ccid(sk, skb, rc);
- if (rc && rc != -EINTR) {
- DCCP_BUG("err=%d after dccp_wait_for_ccid", rc);
- skb_dequeue(&sk->sk_write_queue);
- kfree_skb(skb);
- break;
- }
- /* fall through */
+ sk_reset_timer(sk, &dp->dccps_xmit_timer,
+ jiffies + msecs_to_jiffies(rc));
+ return;
case CCID_PACKET_SEND_AT_ONCE:
dccp_xmit_packet(sk);
break;
@@ -648,7 +664,6 @@ void dccp_send_close(struct sock *sk, const int active)
DCCP_SKB_CB(skb)->dccpd_type = DCCP_PKT_CLOSE;
if (active) {
- dccp_write_xmit(sk, 1);
dccp_skb_entail(sk, skb);
dccp_transmit_skb(sk, skb_clone(skb, prio));
/*
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -726,7 +726,13 @@ int dccp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
goto out_discard;
skb_queue_tail(&sk->sk_write_queue, skb);
- dccp_write_xmit(sk,0);
+ /*
+ * The xmit_timer is set if the TX CCID is rate-based and will expire
+ * when congestion control permits to release further packets into the
+ * network. Window-based CCIDs do not use this timer.
+ */
+ if (!timer_pending(&dp->dccps_xmit_timer))
+ dccp_write_xmit(sk);
out_release:
release_sock(sk);
return rc ? : len;
@@ -951,9 +957,22 @@ void dccp_close(struct sock *sk, long timeout)
/* Check zero linger _after_ checking for unread data. */
sk->sk_prot->disconnect(sk, 0);
} else if (sk->sk_state != DCCP_CLOSED) {
+ /*
+ * Normal connection termination. May need to wait if there are
+ * still packets in the TX queue that are delayed by the CCID.
+ */
+ dccp_flush_write_queue(sk, &timeout);
dccp_terminate_connection(sk);
}
+ /*
+ * Flush write queue. This may be necessary in several cases:
+ * - we have been closed by the peer but still have application data;
+ * - abortive termination (unread data or zero linger time),
+ * - normal termination but queue could not be flushed within time limit
+ */
+ __skb_queue_purge(&sk->sk_write_queue);
+
sk_stream_wait_close(sk, timeout);
adjudge_to_death:
--- a/net/dccp/timer.c
+++ b/net/dccp/timer.c
@@ -249,7 +249,7 @@ static void dccp_write_xmitlet(unsigned long data)
if (sock_owned_by_user(sk))
sk_reset_timer(sk, &dccp_sk(sk)->dccps_xmit_timer, jiffies + 1);
else
- dccp_write_xmit(sk, 0);
+ dccp_write_xmit(sk);
bh_unlock_sock(sk);
}
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox