Netdev List
 help / color / mirror / Atom feed
* Adding bridge interface to non-default network namespace crashes kernel
From: Atis Elsts @ 2009-09-07 15:07 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA; +Cc: containers-qjLDD68F18O7TbgM5vRIOg

[-- Attachment #1: Type: text/plain, Size: 491 bytes --]

Trying to add bridge interface from userspace program, after moving the 
program to a new network namespace, causes kernel to crash. I am using latest 
kernel version from git (2.6.31-rc9).
The bug is easy to reproduce - just compile and run the attached C program.

I see that bridge interface has NETIF_F_NETNS_LOCAL flag, but as I understand, 
this flag simply means that a device cannot be *moved* across network 
namespaces, not that it cannot be *created* in other namespaces.

--Atis

[-- Attachment #2: crash.c --]
[-- Type: text/x-csrc, Size: 571 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <linux/sched.h>
#include <linux/sockios.h>

int main() {
    if (unshare(CLONE_NEWNET)) {
        perror("unshare");
        return -1;
    }

    int fd = socket(AF_INET, SOCK_DGRAM, 0);
    if (fd < 0) {
        perror("socket");
        return -1;
    }

    const char *name = "lobridge";
    if (ioctl(fd, SIOCBRADDBR, name) < 0) {
        perror("ioctl SIOCBRADDBR");
        return -1;
    }

    system("ip addr");
    for (;;) {
        printf("still running\n");
        sleep(5);
    }
}

[-- Attachment #3: syslog --]
[-- Type: text/plain, Size: 3289 bytes --]

Sep  7 16:06:19 debian kernel: [   91.040000] ------------[ cut here ]------------
Sep  7 16:06:19 debian kernel: [   91.040000] kernel BUG at fs/sysfs/group.c:65!
Sep  7 16:06:19 debian kernel: [   91.040000] invalid opcode: 0000 [#1] SMP
Sep  7 16:06:19 debian kernel: [   91.040000] last sysfs file: /sys/devices/virtual/net/lo/operstate
Sep  7 16:06:19 debian kernel: [   91.040000] Modules linked in:
Sep  7 16:06:19 debian kernel: [   91.040000]
Sep  7 16:06:19 debian kernel: [   91.040000] Pid: 1667, comm: a.out Not tainted (2.6.31-rc9 #16)
Sep  7 16:06:19 debian kernel: [   91.040000] EIP: 0060:[<c0218c5e>] EFLAGS: 00000246 CPU: 0
Sep  7 16:06:19 debian kernel: [   91.040000] EIP is at internal_create_group+0x14e/0x180
Sep  7 16:06:19 debian kernel: [   91.040000] EAX: 00000000 EBX: c71fb000 ECX: c053aae0 EDX: 00000000
Sep  7 16:06:19 debian kernel: [   91.040000] ESI: 00000000 EDI: c71fb25c EBP: c6dede58 ESP: c6dede30
Sep  7 16:06:19 debian kernel: [   91.040000]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Sep  7 16:06:19 debian kernel: [   91.040000] Process a.out (pid: 1667, ti=c6dec000 task=c6df1920 task.ti=c6dec000)
Sep  7 16:06:19 debian kernel: [   91.040000] Stack:
Sep  7 16:06:19 debian kernel: [   91.040000]  c71fb008 c053aae0 00000000 c71fb25c 00000000 00000000 c6dede58 c71fb000
Sep  7 16:06:19 debian kernel: [   91.040000] <0> 00000000 c71fb25c c6dede60 c0218cbc c6dede84 c03c6bc3 00000000 00000000
Sep  7 16:06:19 debian kernel: [   91.040000] <0> 00000000 c7102000 c71fb000 00000000 c7102000 c6dedea0 c03c348f 00000001
Sep  7 16:06:19 debian kernel: [   91.040000] Call Trace:
Sep  7 16:06:19 debian kernel: [   91.040000]  [<c0218cbc>] ? sysfs_create_group+0xc/0x10
Sep  7 16:06:19 debian kernel: [   91.040000]  [<c03c6bc3>] ? br_sysfs_addbr+0x23/0xf0
Sep  7 16:06:19 debian kernel: [   91.040000]  [<c03c348f>] ? br_add_bridge+0x18f/0x1a0
Sep  7 16:06:19 debian kernel: [   91.040000]  [<c03c4328>] ? br_ioctl_deviceless_stub+0x1f8/0x210
Sep  7 16:06:19 debian kernel: [   91.040000]  [<c03c4130>] ? br_ioctl_deviceless_stub+0x0/0x210
Sep  7 16:06:19 debian kernel: [   91.040000]  [<c032e0cf>] ? sock_ioctl+0xbf/0x220
Sep  7 16:06:19 debian kernel: [   91.040000]  [<c032e010>] ? sock_ioctl+0x0/0x220
Sep  7 16:06:19 debian kernel: [   91.040000]  [<c01d4e28>] ? vfs_ioctl+0x28/0x80
Sep  7 16:06:19 debian kernel: [   91.040000]  [<c01d4faa>] ? do_vfs_ioctl+0x6a/0x520
Sep  7 16:06:19 debian kernel: [   91.040000]  [<c032d2a3>] ? sock_map_fd+0x43/0x70
Sep  7 16:06:19 debian kernel: [   91.040000]  [<c032db22>] ? sys_socket+0x52/0x70
Sep  7 16:06:19 debian kernel: [   91.040000]  [<c0118138>] ? do_page_fault+0x168/0x310
Sep  7 16:06:19 debian kernel: [   91.040000]  [<c01d54b3>] ? sys_ioctl+0x53/0x70
Sep  7 16:06:19 debian kernel: [   91.040000]  [<c0102c04>] ? sysenter_do_call+0x12/0x22
Sep  7 16:06:19 debian kernel: [   91.040000] Code: bd 8b 5d e0 83 45 ec 01 85 db 0f 84 63 ff ff ff 8b 12 89 f8 e8 e4 ce ff ff 8b 16 e9 53 ff ff ff 8b 40 18 85 c0 0f 85 e9 fe ff ff <0f> 0b eb fe 89 f8 e8 47 ea ff ff 8b 45 e8 83 c4 1c 5b 5e 5f 5d
Sep  7 16:06:19 debian kernel: [   91.040000] EIP: [<c0218c5e>] internal_create_group+0x14e/0x180 SS:ESP 0068:c6dede30
Sep  7 16:06:19 debian kernel: [   91.040000] ---[ end trace beadcfdb06c985eb ]---

[-- Attachment #4: Type: text/plain, Size: 206 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply

* [patch]autosuspend for cdc-ether #2
From: Oliver Neukum @ 2009-09-07 15:16 UTC (permalink / raw)
  To: David Brownell, Torgny Johansson, netdev, linux-usb,
	David S. Miller

Hi,

this allows the use of remote wakeup to autosuspend online
devices. Nobody has made any comments. David, would you apply it?

	Regards
		Oliver

Signed-off-by: Oliver Neukum <oliver@neukum.org>

--

commit 7f2d9a0440f6453462bd7bf95f77eda17c16d865
Author: Oliver Neukum <oliver@neukum.org>
Date:   Fri Aug 28 15:49:15 2009 +0200

    usb: usbnet: runtime power management for active connections
    
    Devices that support remote wakeup can be autosuspended even while the
    interface is up. Transmissions are queued and processed after the device
    has been woken up.

diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c
index 4a6aff5..8ee5bd7 100644
--- a/drivers/net/usb/cdc_ether.c
+++ b/drivers/net/usb/cdc_ether.c
@@ -411,6 +411,12 @@ static int cdc_bind(struct usbnet *dev, struct usb_interface *intf)
 	return 0;
 }
 
+static int cdc_manage_power(struct usbnet *dev, int on)
+{
+	dev->intf->needs_remote_wakeup = on;
+	return 0;
+}
+
 static const struct driver_info	cdc_info = {
 	.description =	"CDC Ethernet Device",
 	.flags =	FLAG_ETHER,
@@ -418,6 +424,7 @@ static const struct driver_info	cdc_info = {
 	.bind =		cdc_bind,
 	.unbind =	usbnet_cdc_unbind,
 	.status =	cdc_status,
+	.manage_power =	cdc_manage_power,
 };
 
 /*-------------------------------------------------------------------------*/
@@ -570,6 +577,7 @@ static struct usb_driver cdc_driver = {
 	.disconnect =	usbnet_disconnect,
 	.suspend =	usbnet_suspend,
 	.resume =	usbnet_resume,
+	.supports_autosuspend = 1,
 };
 
 
diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index edfd9e1..c21b3d2 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -575,6 +575,7 @@ EXPORT_SYMBOL_GPL(usbnet_unlink_rx_urbs);
 int usbnet_stop (struct net_device *net)
 {
 	struct usbnet		*dev = netdev_priv(net);
+	struct driver_info	*info = dev->driver_info;
 	int			temp;
 	DECLARE_WAIT_QUEUE_HEAD_ONSTACK (unlink_wakeup);
 	DECLARE_WAITQUEUE (wait, current);
@@ -612,7 +613,10 @@ int usbnet_stop (struct net_device *net)
 	dev->flags = 0;
 	del_timer_sync (&dev->delay);
 	tasklet_kill (&dev->bh);
-	usb_autopm_put_interface(dev->intf);
+	if (info->manage_power)
+		info->manage_power(dev, 0);
+	else
+		usb_autopm_put_interface(dev->intf);
 
 	return 0;
 }
@@ -693,6 +697,13 @@ int usbnet_open (struct net_device *net)
 
 	// delay posting reads until we're fully open
 	tasklet_schedule (&dev->bh);
+
+	if (info->manage_power) {
+		retval = info->manage_power(dev, 1);
+		if (retval < 0)
+			goto done;
+		usb_autopm_put_interface(dev->intf);
+	}
 	return retval;
 done:
 	usb_autopm_put_interface(dev->intf);
@@ -822,6 +833,7 @@ kevent (struct work_struct *work)
 	if (test_bit (EVENT_TX_HALT, &dev->flags)) {
 		unlink_urbs (dev, &dev->txq);
 		status = usb_clear_halt (dev->udev, dev->out);
+		usb_autopm_put_interface(dev->intf);
 		if (status < 0
 				&& status != -EPIPE
 				&& status != -ESHUTDOWN) {
@@ -893,17 +905,20 @@ static void tx_complete (struct urb *urb)
 	if (urb->status == 0) {
 		dev->net->stats.tx_packets++;
 		dev->net->stats.tx_bytes += entry->length;
+		usb_autopm_put_interface_async(dev->intf);
 	} else {
 		dev->net->stats.tx_errors++;
 
 		switch (urb->status) {
 		case -EPIPE:
+			/* we do not allow autosuspension */
 			usbnet_defer_kevent (dev, EVENT_TX_HALT);
 			break;
 
 		/* software-driven interface shutdown */
 		case -ECONNRESET:		// async unlink
 		case -ESHUTDOWN:		// hardware gone
+			usb_autopm_put_interface_async(dev->intf);
 			break;
 
 		// like rx, tx gets controller i/o faults during khubd delays
@@ -911,6 +926,7 @@ static void tx_complete (struct urb *urb)
 		case -EPROTO:
 		case -ETIME:
 		case -EILSEQ:
+			usb_mark_last_busy(dev->udev);
 			if (!timer_pending (&dev->delay)) {
 				mod_timer (&dev->delay,
 					jiffies + THROTTLE_JIFFIES);
@@ -919,8 +935,10 @@ static void tx_complete (struct urb *urb)
 							urb->status);
 			}
 			netif_stop_queue (dev->net);
+			usb_autopm_put_interface_async(dev->intf);
 			break;
 		default:
+			usb_autopm_put_interface_async(dev->intf);
 			if (netif_msg_tx_err (dev))
 				devdbg (dev, "tx err %d", entry->urb->status);
 			break;
@@ -996,8 +1014,29 @@ int usbnet_start_xmit (struct sk_buff *skb, struct net_device *net)
 		}
 	}
 
+	
+
 	spin_lock_irqsave (&dev->txq.lock, flags);
 
+	retval = usb_autopm_get_interface_async(dev->intf);
+	if (retval < 0) {
+		spin_unlock_irqrestore (&dev->txq.lock, flags);
+		goto drop;
+	}
+
+#ifdef CONFIG_PM
+	/* if this triggers the device is still a sleep */
+	if (test_bit(EVENT_DEV_ASLEEP, &dev->flags)) {
+		/* transmission will be done in resume */
+		dev->deferred = urb;
+		/* no use to process more packets */
+		netif_stop_queue(net);
+		spin_unlock_irqrestore(&dev->txq.lock, flags);
+		retval = NET_XMIT_SUCCESS;
+		goto deferred;
+	}
+#endif
+
 	switch ((retval = usb_submit_urb (urb, GFP_ATOMIC))) {
 	case -EPIPE:
 		netif_stop_queue (net);
@@ -1028,6 +1067,7 @@ drop:
 		devdbg (dev, "> tx, len %d, type 0x%x",
 			length, skb->protocol);
 	}
+deferred:
 	return retval;
 }
 EXPORT_SYMBOL_GPL(usbnet_start_xmit);
@@ -1303,6 +1343,15 @@ int usbnet_suspend (struct usb_interface *intf, pm_message_t message)
 	struct usbnet		*dev = usb_get_intfdata(intf);
 
 	if (!dev->suspend_count++) {
+		spin_lock_irq(&dev->txq.lock);
+		/* don't autosuspend while transmitting */
+		if (dev->txq.qlen && (message.event & PM_EVENT_AUTO)) {
+			spin_unlock_irq(&dev->txq.lock);
+			return -EBUSY;
+		} else {
+			set_bit(EVENT_DEV_ASLEEP, &dev->flags);
+			spin_unlock_irq(&dev->txq.lock);
+		}
 		/*
 		 * accelerate emptying of the rx and queues, to avoid
 		 * having everything error out.
@@ -1322,11 +1371,34 @@ EXPORT_SYMBOL_GPL(usbnet_suspend);
 
 int usbnet_resume (struct usb_interface *intf)
 {
-	struct usbnet		*dev = usb_get_intfdata(intf);
-
-	if (!--dev->suspend_count)
+	struct usbnet           *dev = usb_get_intfdata(intf);
+	struct sk_buff          *skb;
+	struct urb              *res;
+	int                     retval;
+	
+	if (!--dev->suspend_count) {
+		spin_lock_irq(&dev->txq.lock);
+		res = dev->deferred;
+		dev->deferred = NULL;
+		clear_bit(EVENT_DEV_ASLEEP, &dev->flags);
+		spin_unlock_irq(&dev->txq.lock);
+		if (res) {
+			retval = usb_submit_urb(res, GFP_NOIO);
+			if (retval < 0) {
+				usb_free_urb(res);
+				netif_start_queue(dev->net);
+				usb_autopm_put_interface_async(dev->intf);
+			} else {
+				skb = (struct sk_buff *)res->context;
+				dev->net->trans_start = jiffies;
+				__skb_queue_tail (&dev->txq, skb);
+				if (!(dev->txq.qlen >= TX_QLEN(dev)))
+					netif_start_queue(dev->net);
+			}
+		}
 		tasklet_schedule (&dev->bh);
-
+	}
+	
 	return 0;
 }
 EXPORT_SYMBOL_GPL(usbnet_resume);
diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h
index 310e18a..6fa0545 100644
--- a/include/linux/usb/usbnet.h
+++ b/include/linux/usb/usbnet.h
@@ -54,6 +54,7 @@ struct usbnet {
 	struct sk_buff_head	txq;
 	struct sk_buff_head	done;
 	struct urb		*interrupt;
+	struct urb		*deferred;
 	struct tasklet_struct	bh;
 
 	struct work_struct	kevent;
@@ -63,6 +64,8 @@ struct usbnet {
 #		define EVENT_RX_MEMORY	2
 #		define EVENT_STS_SPLIT	3
 #		define EVENT_LINK_RESET	4
+#		define EVENT_DEV_WAKING 5
+#		define EVENT_DEV_ASLEEP 6
 };
 
 static inline struct usb_driver *driver_of(struct usb_interface *intf)
@@ -100,6 +103,9 @@ struct driver_info {
 	/* see if peer is connected ... can sleep */
 	int	(*check_connect)(struct usbnet *);
 
+	/* (dis)activate runtime power management */
+	int	(*manage_power)(struct usbnet *, int);
+
 	/* for status polling */
 	void	(*status)(struct usbnet *, struct urb *);
 



^ permalink raw reply related

* Re: Staging: cpc-usb CAN driver TODO list
From: Sebastian Haas @ 2009-09-07 15:35 UTC (permalink / raw)
  To: Oliver Hartkopp
  Cc: Wolfgang Grandegger, Greg KH, Linux Netdev List, Felipe Balbi
In-Reply-To: <4AA4E93F.90209@ems-wuensche.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Oliver,

Okay, I've just finished the first version today. It is much less effort
 as I previously thought. 95% is written and compiles fine. Just the
sending part is missing and optimization (it is just a basic
implementation without USB anchors and stuff like that).

I think the further development should proceed at the Socket-CAN mailing
list, cause of things needs to be clarified specific to the core.

Cheers,
Sebastian

Sebastian Haas schrieb:
> Okay, thanks for the tips. Lets see what I can create today.
> 
> Sebastian
> 
> Oliver Hartkopp schrieb:
>> Wolfgang Grandegger wrote:
>>> On 09/07/2009 10:01 AM, Sebastian Haas wrote:
>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>> Hash: SHA1
>>>>
>>>> Wolfgang,
>>>>
>>>> Wolfgang Grandegger schrieb:
>>>>> Hi Sebastian,
>>>>>
>>>>> On 09/07/2009 07:56 AM, Sebastian Haas wrote:
>>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>>> Hash: SHA1
>>>>>>
>>>>>> Oliver,
>>>>>>
>>>>>> I'm not yet sure how to actually start the development. There is so
>>>>>> much
>>>>>> to do, and I've not much time to spend on this, unfortunately. Because
>>>>>> of this I can't rewrite the whole driver on my own in order to get a
>>>>>> Socket-CAN driver but I can provide support, review patches, rent
>>>>>> devices and make tests here.
>>>>>>
>>>>>> Oliver, you are not familiar with USB and I'm not very familiar with
>>>>>> CAN
>>>>>> netdev internals, why not combining these twos. You are writing the CAN
>>>>>> part and write the USB part.
>>>>>>
>>>>>> I'll also write a specification which contains any information you need
>>>>>> to develop a CAN driver for the device (commands, sequences, error
>>>>>> handling).
>>>>> Alternatively, EMS Wuensche could also hire an expert doing the job ;-).
>>>>> Note that we do a lot of Socket-CAN work in our free time, which is a
>>>>> limited resource. Progress depends on funding to a certain extend.
>>>> Money is also a limited resource. ;-)
>>>>
>>>> Let's become serious again, I know and respect that many of Socket-CAN
>>>> and the Staging developers spend their free time working on it. We will
>>>> of course work on the driver, but since we've not much time it may take
>>>> several months. If someone wants to help, we would be very glad and
>>>> happy to support the person as far as we can with devices, answers and
>>>> tests.
>>> OK, no problem. I really appreciate your support for Socket-CAN so far.
>> Indeed. Me too.
> 
>> I tried to take a second look into cpc-usb_drv.c and i would suggest to remove
>> all the procfs and the chardev stuff and then create a CAN netdev when you
>> identified an USB node analogue to
> 
>>         /* Detect available channels */
>>         for (i = 0; i < EMS_PCMCIA_MAX_CHAN; i++) {
>>                 dev = alloc_sja1000dev(0);
>>                 if (dev == NULL) {
>>                         err = -ENOMEM;
>>                         goto failure_cleanup;
>>                 }
> 
>>                 card->net_dev[i] = dev;
>>                 priv = netdev_priv(dev);
>>                 priv->priv = card;
>>                 SET_NETDEV_DEV(dev, &pdev->dev);
> 
>> as you know from your ems_pcmcia.c driver
> 
>> and
> 
>> struct net_device *alloc_sja1000dev(int sizeof_priv)
>> {
>>         struct net_device *dev;
>>         struct sja1000_priv *priv;
> 
>>         dev = alloc_candev(sizeof(struct sja1000_priv) + sizeof_priv);
>>         if (!dev)
>>                 return NULL;
> 
>>         priv = netdev_priv(dev);
> 
>>         priv->dev = dev;
>>         priv->can.bittiming_const = &sja1000_bittiming_const;
>>         priv->can.do_set_bittiming = sja1000_set_bittiming;
>>         priv->can.do_set_mode = sja1000_set_mode;
> 
>>         if (sizeof_priv)
>>                 priv->priv = (void *)priv + sizeof(struct sja1000_priv);
> 
>>         return dev;
>> }
> 
>> as you know from the sja1000.c (which can probably be used for the
>> LPC2119_PRODUCT_ID we should try to implement first).
> 
>> Then we need something like this stuff
> 
>> static const struct net_device_ops sja1000_netdev_ops = {
>>         .ndo_open               = sja1000_open,
>>         .ndo_stop               = sja1000_close,
>>         .ndo_start_xmit         = sja1000_start_xmit,
>> };
> 
>> int register_sja1000dev(struct net_device *dev)
>> {
>>         if (!sja1000_probe_chip(dev))
>>                 return -ENODEV;
> 
>>         dev->netdev_ops = &sja1000_netdev_ops;
> 
>>         dev->flags |= IFF_ECHO; /* we support local echo */
> 
>>         set_reset_mode(dev);
>>         chipset_init(dev);
> 
>>         return register_candev(dev);
>> }
> 
>> from sja1000.c
> 
>> And then we have an USB CAN node that has a belonging CAN netdevice (maybe
>> there is something else we can look at that's used in other USB ethernet
>> adapters).
> 
>> I know from the PEAK USB driver at
> 
>> http://www.peak-system.com/fileadmin/media/linux/files/peak-linux-driver.6.11.tar.gz
> 
>> that i just needed to duplicate and modify the usb rx/tx stuff and redirect
>> the CAN frames into the network stack. But this PEAK driver does not have a
>> netlink configuration interface and can only be taken as a limited example ...
> 
>> I assume, when the driver (cpc_usb.c or ems_usb.c analogue to the ems_pcmcia.c
>> ?) is prepared as described above, one can go and connect the rx/tx dataflow
>> and the netlink configuration.
> 
>> Unfortunately i'm short of time the next two weeks but maybe you can start and
>> create such a new C-file (probably based on ems_pcmcia.c) ?
> 
>> Best regards,
>> Oliver
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkqlKDsACgkQpqRB8PJG7XwpdgCglYjvGi/z1z5tX1ozgF/YTA1j
9pwAnAkIHScIlic0KZNZEDWT09jUt3N1
=OVN3
-----END PGP SIGNATURE-----
-- 
EMS Dr. Thomas Wuensche e.K.
Sonnenhang 3
85304 Ilmmuenster
HRA Neuburg a.d. Donau, HR-Nr. 70.106
Phone: +49-8441-490260
Fax  : +49-8441-81860
http://www.ems-wuensche.com

^ permalink raw reply

* Re: multicast routing and multiple interfaces with same IP
From: Ilia K. @ 2009-09-07 15:35 UTC (permalink / raw)
  To: David Miller; +Cc: opurdila, netdev
In-Reply-To: <20090829.000154.263733862.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 1106 bytes --]

Hi,
I'm attaching a new patch. The changes:
mroute.h:
- consistent name for interface index: vifc_lcl_ifindex
- union of vifc_lcl_addr and vifc_lcl_ifindex since either one of them
can be used
ipmr.c:
- case VIFF_USE_IFINDEX and case 0 had almost the same code, so using
fall through and if to eliminate code duplication

Regards,
Ilia.

On Sat, Aug 29, 2009 at 10:01 AM, David Miller<davem@davemloft.net> wrote:
> From: Octavian Purdila <opurdila@ixiacom.com>
> Date: Thu, 27 Aug 2009 02:53:18 +0300
>
>> I don't have context on multicast routing, but this caught my attention:
>>
>>>@@ -61,11 +61,13 @@
>>>      unsigned int vifc_rate_limit;   /* Rate limiter values (NI) */
>>>      struct in_addr vifc_lcl_addr;   /* Our address */
>>>      struct in_addr vifc_rmt_addr;   /* IPIP tunnel addr */
>>>+     int ifindex;                    /* Local interface index */
>>> };
>>>
>>
>> Wouldn't this break userspace ABI?
>>
>> Perhaps you could use a union between vifc_lcl_addr and vifc_ifindex, they seem
>> to be exclusive.
>
> Indeed, this will need to be fixed up.
>

[-- Attachment #2: vif_add.patch --]
[-- Type: text/x-diff, Size: 1587 bytes --]

=== modified file 'include/linux/mroute.h'
--- include/linux/mroute.h	2009-08-10 11:17:32 +0000
+++ include/linux/mroute.h	2009-09-07 15:16:43 +0000
@@ -59,13 +59,17 @@
 	unsigned char vifc_flags;	/* VIFF_ flags */
 	unsigned char vifc_threshold;	/* ttl limit */
 	unsigned int vifc_rate_limit;	/* Rate limiter values (NI) */
-	struct in_addr vifc_lcl_addr;	/* Our address */
+	union {
+		struct in_addr vifc_lcl_addr;     /* Local interface address */
+		int            vifc_lcl_ifindex;  /* Local interface index   */
+	};
 	struct in_addr vifc_rmt_addr;	/* IPIP tunnel addr */
 };
 
-#define VIFF_TUNNEL	0x1	/* IPIP tunnel */
-#define VIFF_SRCRT	0x2	/* NI */
-#define VIFF_REGISTER	0x4	/* register vif	*/
+#define VIFF_TUNNEL		0x1	/* IPIP tunnel */
+#define VIFF_SRCRT		0x2	/* NI */
+#define VIFF_REGISTER		0x4	/* register vif	*/
+#define VIFF_USE_IFINDEX	0x8	/* use vifc_lcl_ifindex to find an interface */
 
 /*
  *	Cache manipulation structures for mrouted and PIMd

=== modified file 'net/ipv4/ipmr.c'
--- net/ipv4/ipmr.c	2009-08-10 11:17:32 +0000
+++ net/ipv4/ipmr.c	2009-09-07 15:20:07 +0000
@@ -470,8 +470,18 @@
 			return err;
 		}
 		break;
+
+	case VIFF_USE_IFINDEX:
 	case 0:
-		dev = ip_dev_find(net, vifc->vifc_lcl_addr.s_addr);
+		if (vifc->vifc_flags==VIFF_USE_IFINDEX) {
+			dev = dev_get_by_index(net, vifc->vifc_lcl_ifindex);
+			if (dev && dev->ip_ptr == NULL) {
+				dev_put(dev);
+				return -EADDRNOTAVAIL;
+			}
+		} else {
+			dev = ip_dev_find(net, vifc->vifc_lcl_addr.s_addr);
+		}
 		if (!dev)
 			return -EADDRNOTAVAIL;
 		err = dev_set_allmulti(dev, 1);


^ permalink raw reply

* Re: [iproute2] tc action mirred    question
From: Xiaofei Wu @ 2009-09-07 16:05 UTC (permalink / raw)
  To: hadi; +Cc: linux netdev


I am a newbie for 'traffic control' and Linux networking. So I ask some experts here  to help me.
Maybe my questions are stupid. But I hope I can get your reply. Thank you!


>> I just want to know:
>> 1) Could I forward the mirroring packets to another node ,and then route it to the destination(if I use 
>> iproute2 (ip, tc ...)  )?  I described my purpose in my last email.

>Yes, you can mirror to another node(B/D). To route on that node(B/D),
>your dst MAC address has to be correct for that destination node(B/D) to
>accept it. You could try to run  the destination node in promisc mode
>and you may be able to get away without changing dst mac.

(1) Could I use  pedit action to modify the dst MAC, so the destination node D will accept it, then forward it to 
node C?  (or use other tools to modify the dst MAC, please give me more information)

(2) If I use 'ifconfig wlan0 promisc ... ' on node D, would it route the mirroring packets (the dst MAC is incorrect)
to node C?


>> 2)  After I mirrored the packets, I should use 'ip route' , 'ip rule' to modify route tables.  Is this right?
>> 

>Assuming you are talking about B/D, yes you can do routing there if the
>node accepts it..


Regards,
Wu



      

^ permalink raw reply

* Re: multicast routing and multiple interfaces with same IP
From: Octavian Purdila @ 2009-09-07 16:25 UTC (permalink / raw)
  To: Ilia K.; +Cc: David Miller, netdev
In-Reply-To: <1b9338490909070835t3517bc36o396539e7ea1721fc@mail.gmail.com>

On Monday 07 September 2009 18:35:41 you wrote:
> Hi,
> I'm attaching a new patch. The changes:
> mroute.h:
> - consistent name for interface index: vifc_lcl_ifindex
> - union of vifc_lcl_addr and vifc_lcl_ifindex since either one of them
> can be used
> ipmr.c:
> - case VIFF_USE_IFINDEX and case 0 had almost the same code, so using
> fall through and if to eliminate code duplication
> 

Hi Ilia,

Looks good to me, but there are a couple of code style issues reported by 
./scripts/checkpatch.pl. 

Also, here:

>+               } else {                                                                                                                       
>+                       dev = ip_dev_find(net, vifc->vifc_lcl_addr.s_addr); 
>+               }                                                                                                                              

Usually no braces are used for single line statements in if/else.

tavi




^ permalink raw reply

* Next Sept 7: Bug : skb_release_head_state on x86
From: Sachin Sant @ 2009-09-07 16:49 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Rothwell, linux-next, David Miller
In-Reply-To: <20090907210206.7830ba68.sfr@canb.auug.org.au>

Today's next kernel running on a x86 box crashed with

BUG: unable to handle kernel paging request at 00010090
IP: [<c034559d>] skb_release_head_state+0x20/0xac
*pdpt = 000000003455c001 *pde = 0000000000000000
Oops: 0002 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu3/topology/core_siblings
Modules linked in: ipv6 microcode fuse loop dm_mod ppdev rtc_cmos i2c_piix4
rtc_core i2c_core rtc_lib button sr_mod tg3 parport_pc sworks_agp cdrom floppy
parport agpgart pcspkr libphy sg ohci_hcd ehci_hcd sd_mod crc_t10dif usbcore
edd fan ide_pci_generic serverworks ide_core ata_generic pata_serverworks
libata ips scsi_mod thermal processor thermal_sys hwmon [last unloaded:
speedstep_lib]

Pid: 6, comm: ksoftirqd/1 Not tainted (2.6.31-rc9-autotest-next-20090907-5-pae
#1) eserver xSeries 235 -[86717AX]-
EIP: 0060:[<c034559d>] EFLAGS: 00010206 CPU: 1
EIP is at skb_release_head_state+0x20/0xac
EAX: 00000000 EBX: f44b5200 ECX: f44b5200 EDX: 00010090
ESI: f5548000 EDI: 00000000 EBP: f5c69dd4 ESP: f5c69dd0
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process ksoftirqd/1 (pid: 6, ti=f5c68000 task=f5c4f280 task.ti=f5c68000)
Stack:
 f44b5200 f5c69de0 c0345398 f5c69e48 f5c69de8 c034542e f5c69e58 c0388807
<0> f44b5200 f5582900 ced1a038 c07ac124 ced1a030 3e6f7c09 eb152044 f4b4bc00
<0> 00000006 c05a594c f5c69e30 c036a2c0 c07ac124 f4b4bc00 f4b4bc00 eb152030
Call Trace:
 [<c0345398>] ? __kfree_skb+0xb/0x71
 [<c034542e>] ? consume_skb+0x30/0x32
 [<c0388807>] ? arp_process+0x572/0x58e
 [<c036a2c0>] ? ip_local_deliver_finish+0x143/0x207
 [<c0388907>] ? arp_rcv+0xda/0xed
 [<c034bdc2>] ? netif_receive_skb+0x43a/0x459
 [<c034bee4>] ? napi_skb_finish+0x1e/0x33
 [<c034c267>] ? napi_gro_receive+0x20/0x24
 [<f8b3667f>] ? tg3_poll+0x5ed/0x802 [tg3]
 [<c034c351>] ? net_rx_action+0x93/0x173
 [<c013769c>] ? __do_softirq+0xa7/0x144
 [<c013775f>] ? do_softirq+0x26/0x2b
 [<c01377ae>] ? ksoftirqd+0x4a/0xae
 [<c0137764>] ? ksoftirqd+0x0/0xae
 [<c0146a2e>] ? kthread+0x61/0x66
 [<c01469cd>] ? kthread+0x0/0x66
 [<c0103507>] ? kernel_thread_helper+0x7/0x10
Code: fe ff ff 83 c4 0c 5b 5e 5f 5d c3 55 89 e5 53 89 c3 8b 40 18 85 c0 74 05
e8 22 ae 00 00 8b 53 1c c7 43 18 00 00 00 00 85 d2 74 11 <f0> ff 0a 0f 94 c0 84
c0 74 07 89 d0 e8 81 c6 05 00 83 7b 6c 00
EIP: [<c034559d>] skb_release_head_state+0x20/0xac SS:ESP 0068:f5c69dd0
CR2: 0000000000010090
---[ end trace 64c8710cf222dc04 ]---

At the time of crash, kernbench was running on this box.

The corresponding c code is : 

0000000000002387 <skb_release_head_state>:
static void skb_release_head_state(struct sk_buff *skb) 
{
    2387:       55                      push   %rbp  
    2388:       48 89 e5                mov    %rsp,%rbp
    238b:       53                      push   %rbx  
    238c:       48 89 fb                mov    %rdi,%rbx
    238f:       48 83 ec 08             sub    $0x8,%rsp
skb_dst_drop():
/usr/local/autobench/var/tmp/build/linux/include/net/dst.h:179
}
...... <SNIP> ......
...... <SNIP> ......

skb_release_head_state():
/usr/local/autobench/var/tmp/build/linux/net/core/skbuff.c:395
        skb_dst_drop(skb);
#ifdef CONFIG_XFRM
        secpath_put(skb->sp);
    23a1:       48 8b 7b 30             mov    0x30(%rbx),%rdi
skb_dst_drop():
/usr/local/autobench/var/tmp/build/linux/include/net/dst.h:181
        skb->_skb_dst = 0UL;
    23a5:       48 c7 43 28 00 00 00    movq   $0x0,0x28(%rbx)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  This line
    23ac:       00

Let me know what other information is required to debug this.

Thanks
-Sachin


-- 

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------


^ permalink raw reply

* Re: net_sched 05/07: reintroduce dev->qdisc for use by sch_api
From: Jarek Poplawski @ 2009-09-07 16:49 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev
In-Reply-To: <4AA507AD.6000403@trash.net>

On Mon, Sep 07, 2009 at 03:16:29PM +0200, Patrick McHardy wrote:
> Jarek Poplawski wrote:
> >> @@ -1383,7 +1375,7 @@ static int tc_ctl_tclass(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
> >>  			pid = TC_H_MAKE(qid, pid);
> >>  	} else {
> >>  		if (qid == 0)
> >> -			qid = dev_queue->qdisc_sleeping->handle;
> >> +			qid = dev->qdisc->handle;
> > 
> > Probably I miss something, but in mq root case it seems to never do
> > anything we need. If so, it could be the example of possible issues
> > elsewhere.
> 
> Sorry, I'm not sure what you're saying ..
> 
> > I thought this mq virtual root qdisc could be done more transparently
> > and invisible for the current code, but it seems, in your
> > implementation some pointers like this, or parent ids (especially
> > TC_H_ROOT) might be different, and even if it works OK, needs a lot of
> > verification. So, my question is, if it's really necessary.
> 
> Same here.

Nevermind! I simply had a dream there could be preserved some old
meaning of "root" etc. within a queue but it doesn't make a sense with
this kind of interface.

Jarek P.

^ permalink raw reply

* Re: Next Sept 7: Bug : skb_release_head_state on x86
From: Eric Dumazet @ 2009-09-07 17:17 UTC (permalink / raw)
  To: Sachin Sant; +Cc: netdev, Stephen Rothwell, linux-next, David Miller
In-Reply-To: <4AA5399A.405@in.ibm.com>

Sachin Sant a écrit :
> Today's next kernel running on a x86 box crashed with
> 
> BUG: unable to handle kernel paging request at 00010090
> IP: [<c034559d>] skb_release_head_state+0x20/0xac
> *pdpt = 000000003455c001 *pde = 0000000000000000
> Oops: 0002 [#1] SMP
> last sysfs file: /sys/devices/system/cpu/cpu3/topology/core_siblings
> Modules linked in: ipv6 microcode fuse loop dm_mod ppdev rtc_cmos i2c_piix4
> rtc_core i2c_core rtc_lib button sr_mod tg3 parport_pc sworks_agp cdrom
> floppy
> parport agpgart pcspkr libphy sg ohci_hcd ehci_hcd sd_mod crc_t10dif
> usbcore
> edd fan ide_pci_generic serverworks ide_core ata_generic pata_serverworks
> libata ips scsi_mod thermal processor thermal_sys hwmon [last unloaded:
> speedstep_lib]
> 
> Pid: 6, comm: ksoftirqd/1 Not tainted
> (2.6.31-rc9-autotest-next-20090907-5-pae
> #1) eserver xSeries 235 -[86717AX]-
> EIP: 0060:[<c034559d>] EFLAGS: 00010206 CPU: 1
> EIP is at skb_release_head_state+0x20/0xac
> EAX: 00000000 EBX: f44b5200 ECX: f44b5200 EDX: 00010090
> ESI: f5548000 EDI: 00000000 EBP: f5c69dd4 ESP: f5c69dd0
> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Process ksoftirqd/1 (pid: 6, ti=f5c68000 task=f5c4f280 task.ti=f5c68000)
> Stack:
> f44b5200 f5c69de0 c0345398 f5c69e48 f5c69de8 c034542e f5c69e58 c0388807
> <0> f44b5200 f5582900 ced1a038 c07ac124 ced1a030 3e6f7c09 eb152044 f4b4bc00
> <0> 00000006 c05a594c f5c69e30 c036a2c0 c07ac124 f4b4bc00 f4b4bc00 eb152030
> Call Trace:

This is a crash on a 32bit kernel

> [<c0345398>] ? __kfree_skb+0xb/0x71
> [<c034542e>] ? consume_skb+0x30/0x32
> [<c0388807>] ? arp_process+0x572/0x58e
> [<c036a2c0>] ? ip_local_deliver_finish+0x143/0x207
> [<c0388907>] ? arp_rcv+0xda/0xed
> [<c034bdc2>] ? netif_receive_skb+0x43a/0x459
> [<c034bee4>] ? napi_skb_finish+0x1e/0x33
> [<c034c267>] ? napi_gro_receive+0x20/0x24
> [<f8b3667f>] ? tg3_poll+0x5ed/0x802 [tg3]
> [<c034c351>] ? net_rx_action+0x93/0x173
> [<c013769c>] ? __do_softirq+0xa7/0x144
> [<c013775f>] ? do_softirq+0x26/0x2b
> [<c01377ae>] ? ksoftirqd+0x4a/0xae
> [<c0137764>] ? ksoftirqd+0x0/0xae
> [<c0146a2e>] ? kthread+0x61/0x66
> [<c01469cd>] ? kthread+0x0/0x66
> [<c0103507>] ? kernel_thread_helper+0x7/0x10
> Code: fe ff ff 83 c4 0c 5b 5e 5f 5d c3 55 89 e5 53 89 c3 8b 40 18 85 c0
> 74 05
> e8 22 ae 00 00 8b 53 1c c7 43 18 00 00 00 00 85 d2 74 11 <f0> ff 0a 0f
> 94 c0 84
> c0 74 07 89 d0 e8 81 c6 05 00 83 7b 6c 00
> EIP: [<c034559d>] skb_release_head_state+0x20/0xac SS:ESP 0068:f5c69dd0
> CR2: 0000000000010090
> ---[ end trace 64c8710cf222dc04 ]---
> 
> At the time of crash, kernbench was running on this box.
> 
> The corresponding c code is :
> 0000000000002387 <skb_release_head_state>:
> static void skb_release_head_state(struct sk_buff *skb) {


and you decode a 64 bits kernel

>    2387:       55                      push   %rbp     2388:       48 89
> e5                mov    %rsp,%rbp
>    238b:       53                      push   %rbx     238c:       48 89
> fb                mov    %rdi,%rbx
>    238f:       48 83 ec 08             sub    $0x8,%rsp
> skb_dst_drop():
> /usr/local/autobench/var/tmp/build/linux/include/net/dst.h:179
> }
> ...... <SNIP> ......
> ...... <SNIP> ......
> 
> skb_release_head_state():
> /usr/local/autobench/var/tmp/build/linux/net/core/skbuff.c:395
>        skb_dst_drop(skb);
> #ifdef CONFIG_XFRM
>        secpath_put(skb->sp);
>    23a1:       48 8b 7b 30             mov    0x30(%rbx),%rdi
> skb_dst_drop():
> /usr/local/autobench/var/tmp/build/linux/include/net/dst.h:181
>        skb->_skb_dst = 0UL;
>    23a5:       48 c7 43 28 00 00 00    movq   $0x0,0x28(%rbx)
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  This line
>    23ac:       00
> 

This is more probably
 
<f0> ff 0a       lock decl (%edx)

part of : 

secpath_put(skb->sp);


So some skb has a strange/buggy skb->sp (value 0x00010090)

It looks like skb->cb[xxx] overwrote skb->sp

Please check you have CONFIG_XFRM=y, and that you did rebuild all your modules after
patching your kernel...

^ permalink raw reply

* Re: net_sched 00/07: classful multiqueue dummy scheduler
From: Eric Dumazet @ 2009-09-07 17:21 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, netdev
In-Reply-To: <4AA5175F.6030600@trash.net>

Patrick McHardy a écrit :
> Patrick McHardy wrote:
>> Eric Dumazet wrote:
>>> Had very litle time to test this, but got problems very fast, if rate estimator configured.
>> I didn't test that, but I'll look into it.
>>
>>> qdisc mq 1: root
>>>  Sent 528702 bytes 3491 pkt (dropped 0, overlimits 0 requeues 0)
>>>  rate 177925Kbit 49pps backlog 0b 0p requeues 0
>>> qdisc pfifo 8001: parent 1:1 limit 1000p
>>>  Sent 528702 bytes 3491 pkt (dropped 0, overlimits 0 requeues 0)
>>>  rate 25400bit 21pps backlog 0b 0p requeues 0
>>>
>>> <<<crash>>>
>> Did you capture the crash?

No, in fact it was a freeze.

>>
>>> (On another term I had a "ping -i 0.1 192.168.20.120" that gave :
>>>
>>> 2009/08/07 14:53:42.498 64 bytes from 192.168.20.120: icmp_seq=1982 ttl=64 time=0.126 ms
>>> 2009/08/07 14:53:42.598 64 bytes from 192.168.20.120: icmp_seq=1983 ttl=64 time=0.118 ms
>>> 2009/08/07 14:53:42.698 64 bytes from 192.168.20.120: icmp_seq=1984 ttl=64 time=0.114 ms
>>> 2009/08/07 14:53:42.798 64 bytes from 192.168.20.120: icmp_seq=1985 ttl=64 time=0.123 ms
>>> 2009/08/07 14:53:42.898 64 bytes from 192.168.20.120: icmp_seq=1986 ttl=64 time=0.126 ms
>>> 2009/08/07 14:53:42.998 64 bytes from 192.168.20.120: icmp_seq=1987 ttl=64 time=0.119 ms
>>> 2009/08/07 14:53:43.098 64 bytes from 192.168.20.120: icmp_seq=1988 ttl=64 time=0.122 ms
>>> 2009/08/07 14:53:43.198 64 bytes from 192.168.20.120: icmp_seq=1989 ttl=64 time=0.119 ms
>>> 2009/08/07 14:53:43.298 64 bytes from 192.168.20.120: icmp_seq=1990 ttl=64 time=0.117 ms
>>> 2009/08/07 14:53:43.398 64 bytes from 192.168.20.120: icmp_seq=1991 ttl=64 time=0.117 ms
>>> ping: sendmsg: No buffer space available
>> Was this also with rate estimators? No buffer space available
>> indicates that some class/qdisc isn't dequeued or the packets
>> are leaking, so the output of tc -s -d qdisc show ... might be
>> helpful.
> 
> I figured out the bug, which is likely responsible for both
> problems. When grafting a mq class and creating a rate estimator,
> the new qdisc is not attached to the device queue yet and also
> doesn't have TC_H_ROOT as parent, so qdisc_create() selects
> qdisc_root_sleeping_lock() for the estimator, which belongs to
> the qdisc that is getting replaced.
> 
> This is a patch I used for testing, but I'll come up with
> something more elegant (I hope) as a final fix :)

Yes, this was the problem, and your patch fixed it.

Now adding CONFIG_SLUB_DEBUG_ON=y for next tries :)

Sep  7 16:37:55 erd kernel: [  217.056813] =============================================================================
Sep  7 16:37:55 erd kernel: [  217.056865] BUG kmalloc-256: Poison overwritten
Sep  7 16:37:55 erd kernel: [  217.056910] -----------------------------------------------------------------------------
Sep  7 16:37:55 erd kernel: [  217.056911]
Sep  7 16:37:55 erd kernel: [  217.056990] INFO: 0xf6e622bc-0xf6e622bd. First byte 0x76 instead of 0x6b
Sep  7 16:37:55 erd kernel: [  217.057049] INFO: Allocated in qdisc_alloc+0x1b/0x80 age=154593 cpu=2 pid=5165
Sep  7 16:37:55 erd kernel: [  217.057094] INFO: Freed in qdisc_destroy+0x88/0xa0 age=139186 cpu=4 pid=5173
Sep  7 16:37:55 erd kernel: [  217.057139] INFO: Slab 0xc16ddc40 objects=26 used=6 fp=0xf6e62260 flags=0x28040c3
Sep  7 16:37:55 erd kernel: [  217.057184] INFO: Object 0xf6e62260 @offset=608 fp=0xf6e62850
Sep  7 16:37:55 erd kernel: [  217.057184]
Sep  7 16:37:55 erd kernel: [  217.057259] Bytes b4 0xf6e62250:  d9 04 00 00 fc 6f fb ff 5a 5a 5a 5a 5a 5a 5a 5a Ù...üoûÿZZZZZZZZ
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e62260:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e62270:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e62280:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e62290:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e622a0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e622b0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 76 76 6b 6b kkkkkkkkkkkkvvkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e622c0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e622d0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e622e0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e622f0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e62300:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e62310:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e62320:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e62330:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e62340:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e62350:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk¥
Sep  7 16:37:55 erd kernel: [  217.057771]  Redzone 0xf6e62360:  bb bb bb bb                                     »»»»
Sep  7 16:37:55 erd kernel: [  217.057771]  Padding 0xf6e62388:  5a 5a 5a 5a 5a 5a 5a 5a                         ZZZZZZZZ
Sep  7 16:37:55 erd kernel: [  217.057771] Pid: 5334, comm: bash Not tainted 2.6.31-rc5-04006-gedfbc1d-dirty #188
Sep  7 16:37:55 erd kernel: [  217.057771] Call Trace:
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c02a6d5f>] print_trailer+0xcf/0x120
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c02a6e69>] check_bytes_and_report+0xb9/0xe0
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c02a7097>] check_object+0x1b7/0x200
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c02a89b6>] __slab_alloc+0x3d6/0x5a0
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c02a9602>] __kmalloc+0x172/0x180
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c02e4c02>] ? load_elf_binary+0x122/0x1550
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c02e4c02>] load_elf_binary+0x122/0x1550
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c035655e>] ? strrchr+0xe/0x30
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c02e2644>] ? load_misc_binary+0x64/0x420
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c029190f>] ? page_address+0xcf/0xf0
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c0291aac>] ? kmap_high+0x1c/0x1e0
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c029190f>] ? page_address+0xcf/0xf0
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c029194a>] ? kunmap_high+0x1a/0x90
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c02b20d7>] search_binary_handler+0xa7/0x240
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c02b3686>] do_execve+0x2e6/0x3c0
Sep  7 16:37:56 erd kernel: [  217.057771]  [<c0201638>] sys_execve+0x28/0x60
Sep  7 16:37:56 erd kernel: [  217.057771]  [<c0202d08>] sysenter_do_call+0x12/0x26
Sep  7 16:37:56 erd kernel: [  217.057771] FIX kmalloc-256: Restoring 0xf6e622bc-0xf6e622bd=0x6b
Sep  7 16:37:56 erd kernel: [  217.057771]
Sep  7 16:37:56 erd kernel: [  217.057771] FIX kmalloc-256: Marking all objects used

^ permalink raw reply

* Re: net_sched 00/07: classful multiqueue dummy scheduler
From: Patrick McHardy @ 2009-09-07 17:28 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <4AA54128.2050607@gmail.com>

Eric Dumazet wrote:
>> I figured out the bug, which is likely responsible for both
>> problems. When grafting a mq class and creating a rate estimator,
>> the new qdisc is not attached to the device queue yet and also
>> doesn't have TC_H_ROOT as parent, so qdisc_create() selects
>> qdisc_root_sleeping_lock() for the estimator, which belongs to
>> the qdisc that is getting replaced.
>>
>> This is a patch I used for testing, but I'll come up with
>> something more elegant (I hope) as a final fix :)
> 
> Yes, this was the problem, and your patch fixed it.

Thanks for testing.

> Now adding CONFIG_SLUB_DEBUG_ON=y for next tries :)
> 
> Sep  7 16:37:55 erd kernel: [  217.056813] =============================================================================
> Sep  7 16:37:55 erd kernel: [  217.056865] BUG kmalloc-256: Poison overwritten
> Sep  7 16:37:55 erd kernel: [  217.056910] -----------------------------------------------------------------------------
> Sep  7 16:37:55 erd kernel: [  217.056911]
> Sep  7 16:37:55 erd kernel: [  217.056990] INFO: 0xf6e622bc-0xf6e622bd. First byte 0x76 instead of 0x6b
> Sep  7 16:37:55 erd kernel: [  217.057049] INFO: Allocated in qdisc_alloc+0x1b/0x80 age=154593 cpu=2 pid=5165
> Sep  7 16:37:55 erd kernel: [  217.057094] INFO: Freed in qdisc_destroy+0x88/0xa0 age=139186 cpu=4 pid=5173
> Sep  7 16:37:55 erd kernel: [  217.057139] INFO: Slab 0xc16ddc40 objects=26 used=6 fp=0xf6e62260 flags=0x28040c3
> Sep  7 16:37:55 erd kernel: [  217.057184] INFO: Object 0xf6e62260 @offset=608 fp=0xf6e62850
> Sep  7 16:37:55 erd kernel: [  217.057184]

I'm unable to reproduce this. Could you send me the commands you
used that lead to this?


^ permalink raw reply

* Re: net_sched 00/07: classful multiqueue dummy scheduler
From: Eric Dumazet @ 2009-09-07 17:30 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, netdev
In-Reply-To: <4AA542B4.4090206@trash.net>

Patrick McHardy a écrit :
> Eric Dumazet wrote:
>>> I figured out the bug, which is likely responsible for both
>>> problems. When grafting a mq class and creating a rate estimator,
>>> the new qdisc is not attached to the device queue yet and also
>>> doesn't have TC_H_ROOT as parent, so qdisc_create() selects
>>> qdisc_root_sleeping_lock() for the estimator, which belongs to
>>> the qdisc that is getting replaced.
>>>
>>> This is a patch I used for testing, but I'll come up with
>>> something more elegant (I hope) as a final fix :)
>> Yes, this was the problem, and your patch fixed it.
> 
> Thanks for testing.
> 
>> Now adding CONFIG_SLUB_DEBUG_ON=y for next tries :)
>>
>> Sep  7 16:37:55 erd kernel: [  217.056813] =============================================================================
>> Sep  7 16:37:55 erd kernel: [  217.056865] BUG kmalloc-256: Poison overwritten
>> Sep  7 16:37:55 erd kernel: [  217.056910] -----------------------------------------------------------------------------
>> Sep  7 16:37:55 erd kernel: [  217.056911]
>> Sep  7 16:37:55 erd kernel: [  217.056990] INFO: 0xf6e622bc-0xf6e622bd. First byte 0x76 instead of 0x6b
>> Sep  7 16:37:55 erd kernel: [  217.057049] INFO: Allocated in qdisc_alloc+0x1b/0x80 age=154593 cpu=2 pid=5165
>> Sep  7 16:37:55 erd kernel: [  217.057094] INFO: Freed in qdisc_destroy+0x88/0xa0 age=139186 cpu=4 pid=5173
>> Sep  7 16:37:55 erd kernel: [  217.057139] INFO: Slab 0xc16ddc40 objects=26 used=6 fp=0xf6e62260 flags=0x28040c3
>> Sep  7 16:37:55 erd kernel: [  217.057184] INFO: Object 0xf6e62260 @offset=608 fp=0xf6e62850
>> Sep  7 16:37:55 erd kernel: [  217.057184]
> 
> I'm unable to reproduce this. Could you send me the commands you
> used that lead to this?
> 

Sorry, this was *before* your last patch.

I tried to have more information, because I was not able to get console messages at crash time on this remote dev machine.

enabling SLUB checks got some hint of what the problem was (using memory block after its freeing by qdisc_destroy)

^ permalink raw reply

* Re: net_sched 00/07: classful multiqueue dummy scheduler
From: Patrick McHardy @ 2009-09-07 17:33 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <4AA54347.8020401@gmail.com>

Eric Dumazet wrote:
> Patrick McHardy a écrit :
>>> Sep  7 16:37:55 erd kernel: [  217.056813] =============================================================================
>>> Sep  7 16:37:55 erd kernel: [  217.056865] BUG kmalloc-256: Poison overwritten
>>> Sep  7 16:37:55 erd kernel: [  217.056910] -----------------------------------------------------------------------------
>>> Sep  7 16:37:55 erd kernel: [  217.056911]
>>> Sep  7 16:37:55 erd kernel: [  217.056990] INFO: 0xf6e622bc-0xf6e622bd. First byte 0x76 instead of 0x6b
>>> Sep  7 16:37:55 erd kernel: [  217.057049] INFO: Allocated in qdisc_alloc+0x1b/0x80 age=154593 cpu=2 pid=5165
>>> Sep  7 16:37:55 erd kernel: [  217.057094] INFO: Freed in qdisc_destroy+0x88/0xa0 age=139186 cpu=4 pid=5173
>>> Sep  7 16:37:55 erd kernel: [  217.057139] INFO: Slab 0xc16ddc40 objects=26 used=6 fp=0xf6e62260 flags=0x28040c3
>>> Sep  7 16:37:55 erd kernel: [  217.057184] INFO: Object 0xf6e62260 @offset=608 fp=0xf6e62850
>>> Sep  7 16:37:55 erd kernel: [  217.057184]
>> I'm unable to reproduce this. Could you send me the commands you
>> used that lead to this?
>>
> 
> Sorry, this was *before* your last patch.
> 
> I tried to have more information, because I was not able to get console messages at crash time on this remote dev machine.
> 
> enabling SLUB checks got some hint of what the problem was (using memory block after its freeing by qdisc_destroy)

OK, that probably explains it, the spinlock operations were operating
on already freed memory.

I'll do some more testing and will send the final patch if no
other problems show up.

^ permalink raw reply

* Re: net_sched 00/07: classful multiqueue dummy scheduler
From: Eric Dumazet @ 2009-09-07 17:38 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, netdev
In-Reply-To: <4AA54401.4010003@trash.net>

Patrick McHardy a écrit :
> Eric Dumazet wrote:
>> Patrick McHardy a écrit :
>>>> Sep  7 16:37:55 erd kernel: [  217.056813] =============================================================================
>>>> Sep  7 16:37:55 erd kernel: [  217.056865] BUG kmalloc-256: Poison overwritten
>>>> Sep  7 16:37:55 erd kernel: [  217.056910] -----------------------------------------------------------------------------
>>>> Sep  7 16:37:55 erd kernel: [  217.056911]
>>>> Sep  7 16:37:55 erd kernel: [  217.056990] INFO: 0xf6e622bc-0xf6e622bd. First byte 0x76 instead of 0x6b
>>>> Sep  7 16:37:55 erd kernel: [  217.057049] INFO: Allocated in qdisc_alloc+0x1b/0x80 age=154593 cpu=2 pid=5165
>>>> Sep  7 16:37:55 erd kernel: [  217.057094] INFO: Freed in qdisc_destroy+0x88/0xa0 age=139186 cpu=4 pid=5173
>>>> Sep  7 16:37:55 erd kernel: [  217.057139] INFO: Slab 0xc16ddc40 objects=26 used=6 fp=0xf6e62260 flags=0x28040c3
>>>> Sep  7 16:37:55 erd kernel: [  217.057184] INFO: Object 0xf6e62260 @offset=608 fp=0xf6e62850
>>>> Sep  7 16:37:55 erd kernel: [  217.057184]
>>> I'm unable to reproduce this. Could you send me the commands you
>>> used that lead to this?
>>>
>> Sorry, this was *before* your last patch.
>>
>> I tried to have more information, because I was not able to get console messages at crash time on this remote dev machine.
>>
>> enabling SLUB checks got some hint of what the problem was (using memory block after its freeing by qdisc_destroy)
> 
> OK, that probably explains it, the spinlock operations were operating
> on already freed memory.
> 
> I'll do some more testing and will send the final patch if no
> other problems show up.

BTW, you may ignore rate estimation requests on the mq root, since its stats
are updated only by user request, when doing a "tc -s -q qdisc" command, while
estimator is fired by a cyclic timer...



^ permalink raw reply

* Re: net_sched 00/07: classful multiqueue dummy scheduler
From: Patrick McHardy @ 2009-09-07 17:46 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <4AA54503.7090409@gmail.com>

Eric Dumazet wrote:
> Patrick McHardy a écrit :
>> Eric Dumazet wrote:
>>> Patrick McHardy a écrit :
>>>>> Sep  7 16:37:55 erd kernel: [  217.056813] =============================================================================
>>>>> Sep  7 16:37:55 erd kernel: [  217.056865] BUG kmalloc-256: Poison overwritten
>>>>> Sep  7 16:37:55 erd kernel: [  217.056910] -----------------------------------------------------------------------------
>>>>> Sep  7 16:37:55 erd kernel: [  217.056911]
>>>>> Sep  7 16:37:55 erd kernel: [  217.056990] INFO: 0xf6e622bc-0xf6e622bd. First byte 0x76 instead of 0x6b
>>>>> Sep  7 16:37:55 erd kernel: [  217.057049] INFO: Allocated in qdisc_alloc+0x1b/0x80 age=154593 cpu=2 pid=5165
>>>>> Sep  7 16:37:55 erd kernel: [  217.057094] INFO: Freed in qdisc_destroy+0x88/0xa0 age=139186 cpu=4 pid=5173
>>>>> Sep  7 16:37:55 erd kernel: [  217.057139] INFO: Slab 0xc16ddc40 objects=26 used=6 fp=0xf6e62260 flags=0x28040c3
>>>>> Sep  7 16:37:55 erd kernel: [  217.057184] INFO: Object 0xf6e62260 @offset=608 fp=0xf6e62850
>>>>> Sep  7 16:37:55 erd kernel: [  217.057184]
>>>> I'm unable to reproduce this. Could you send me the commands you
>>>> used that lead to this?
>>>>
>>> Sorry, this was *before* your last patch.
>>>
>>> I tried to have more information, because I was not able to get console messages at crash time on this remote dev machine.
>>>
>>> enabling SLUB checks got some hint of what the problem was (using memory block after its freeing by qdisc_destroy)
>> OK, that probably explains it, the spinlock operations were operating
>> on already freed memory.
>>
>> I'll do some more testing and will send the final patch if no
>> other problems show up.
> 
> BTW, you may ignore rate estimation requests on the mq root, since its stats
> are updated only by user request, when doing a "tc -s -q qdisc" command, while
> estimator is fired by a cyclic timer...

Yes, that's probably the cleanest solution. I was considering
cloning the root estimator to the real qdiscs and summing them
up, but for now I think I'll rather disable them on the mq root
completely.

^ permalink raw reply

* Re: net_sched 07/07: add classful multiqueue dummy scheduler
From: Jarek Poplawski @ 2009-09-07 18:22 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev
In-Reply-To: <4AA50A49.7010905@trash.net>

On Mon, Sep 07, 2009 at 03:27:37PM +0200, Patrick McHardy wrote:
> Jarek Poplawski wrote:
...
> >> @@ -1095,10 +1100,16 @@ create_n_graft:
> >>  		q = qdisc_create(dev, &dev->rx_queue,
> >>  				 tcm->tcm_parent, tcm->tcm_parent,
> >>  				 tca, &err);
> >> -	else
> >> -		q = qdisc_create(dev, netdev_get_tx_queue(dev, 0),
> >> +	else {
> >> +		unsigned int ntx = 0;
> >> +
> >> +		if (p && p->ops->cl_ops && p->ops->cl_ops->select_queue)
> >> +			ntx = p->ops->cl_ops->select_queue(p, tcm);
> > 
> > So, this if could be probably made shorter with a common function, but
> > the main point is: this probably works only for qdiscs having mq as a
> > parent, and not below.
> 
> Yes. mq can only be attached to the root however, so its not
> possible to use it as a child qdisc.

I mean this ->select_queue() works OK for a child qdisc of mq, e.g.
htb, but not for a child qdisc of this htb qdisc, e.g. sfq.

Jarek P.

^ permalink raw reply

* Re: [PATCH 00/12] Gigaset driver patches for 2.6.32
From: Karsten Keil @ 2009-09-07 18:56 UTC (permalink / raw)
  To: David Miller; +Cc: tilman, linux-kernel, netdev, i4ldeveloper, hjlipp
In-Reply-To: <20090907.021808.57016299.davem@davemloft.net>

On Montag, 7. September 2009 11:18:08 David Miller wrote:
> From: Tilman Schmidt <tilman@imap.cc>
> Date: Mon, 07 Sep 2009 11:11:37 +0200
>
> > David Miller schrieb:
> >> From: Tilman Schmidt <tilman@imap.cc>
> >> Date: Sun,  6 Sep 2009 20:58:52 +0200 (CEST)
> >>
> >>> Would you please take these into your net tree for 2.6.32.
> >>
> >> So do we have an ISDN maintainer or not?
> >>
> >> If Karsten is still maintaining things, your work should
> >> go through him not me.
> >
> > Last atime I asked, you said me you would take ISDN patches.
> > Could you two please sort this out?
>
> Sure.
>
> Karsten, are you going to handle ISDN and ISDN driver patches
> and queue them up to me?
>

Yes, I will  put them in my git tree after review and give you a ping.

> > I am just a humble driver maintainer knowing nothing about
> > maintainer politics.
>
> Understood.



^ permalink raw reply

* Re: net_sched 07/07: add classful multiqueue dummy scheduler
From: Jarek Poplawski @ 2009-09-07 19:24 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev
In-Reply-To: <4AA50A49.7010905@trash.net>

On Mon, Sep 07, 2009 at 03:27:37PM +0200, Patrick McHardy wrote:
> Jarek Poplawski wrote:
...
> >> +static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
> >> +{
> >> +	struct net_device *dev = qdisc_dev(sch);
> >> +	struct Qdisc *qdisc;
> >> +	unsigned int ntx;
> >> +
> >> +	sch->q.qlen = 0;
> >> +	memset(&sch->bstats, 0, sizeof(sch->bstats));
> >> +	memset(&sch->qstats, 0, sizeof(sch->qstats));
> >> +
> >> +	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
> >> +		qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
> >> +		spin_lock_bh(qdisc_lock(qdisc));
> >> +		sch->q.qlen		+= qdisc->q.qlen;
> >> +		sch->bstats.bytes	+= qdisc->bstats.bytes;
> >> +		sch->bstats.packets	+= qdisc->bstats.packets;
> >> +		sch->qstats.qlen	+= qdisc->qstats.qlen;
> > 
> > Like in Christoph's case, we should probably use q.qlen instead.
> 
> Its done a few lines above. This simply sums up all members of qstats.

AFAICS these members are updated only in tc_fill_qdisc, starting from
the root, so they might be not up-to-date at the moment, unless I miss
something.

Jarek P.

^ permalink raw reply

* Re: net_sched 07/07: add classful multiqueue dummy scheduler
From: Eric Dumazet @ 2009-09-07 19:49 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Patrick McHardy, netdev
In-Reply-To: <20090907192429.GC4451@ami.dom.local>

Jarek Poplawski a écrit :
> On Mon, Sep 07, 2009 at 03:27:37PM +0200, Patrick McHardy wrote:
>> Jarek Poplawski wrote:
> ...
>>>> +static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
>>>> +{
>>>> +	struct net_device *dev = qdisc_dev(sch);
>>>> +	struct Qdisc *qdisc;
>>>> +	unsigned int ntx;
>>>> +
>>>> +	sch->q.qlen = 0;
>>>> +	memset(&sch->bstats, 0, sizeof(sch->bstats));
>>>> +	memset(&sch->qstats, 0, sizeof(sch->qstats));
>>>> +
>>>> +	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
>>>> +		qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
>>>> +		spin_lock_bh(qdisc_lock(qdisc));
>>>> +		sch->q.qlen		+= qdisc->q.qlen;
>>>> +		sch->bstats.bytes	+= qdisc->bstats.bytes;
>>>> +		sch->bstats.packets	+= qdisc->bstats.packets;
>>>> +		sch->qstats.qlen	+= qdisc->qstats.qlen;
>>> Like in Christoph's case, we should probably use q.qlen instead.
>> Its done a few lines above. This simply sums up all members of qstats.
> 
> AFAICS these members are updated only in tc_fill_qdisc, starting from
> the root, so they might be not up-to-date at the moment, unless I miss
> something.
> 

Yes, we might need an q->ops->update_stats(struct Qdisc *sch) method, and
to recursively call it from mq_update_stats()


^ permalink raw reply

* hello
From: janesoumah @ 2009-09-07 18:52 UTC (permalink / raw)


Hello Dear Good Day!!!,
How are you today, i hope all is well with you. My name is Jane  I want to have a good relationship and share things in common with you. i will like to tell you more about me and send some of my pictures to you, As soon as you reply my letter to my private email.(janesoumah@yahoo.co.uk Hoping to have a good relationship with you. yours lovely Friend, Jane

^ permalink raw reply

* Stop using tasklets for bottom halves
From: Luis R. Rodriguez @ 2009-09-07 22:58 UTC (permalink / raw)
  To: Steven Rostedt, Ingo Molnar, Michael Buesch, John W. Linville
  Cc: linux-wireless, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Matt Smith, Kevin Hayes,
	Bob Copeland, Jouni Malinen, Ivan Seskar,
	ic.felix-Re5JQEeQqe8AvxtiuMwx3w

A while ago I had read about an effort to consider removing tasklets
[1] or at least trying to not use them. I'm unaware of the progress in
this respect but since reading that article have always tried to
evaluate whether or not we need tasklets on wireless drivers. I have
also wondered whether work in irq context in other parts of the kernel
can be moved to process context, a curious example being timers. I'll
personally be trying to using only process context on bottom halves on
future drivers but I figured it may be a good time to ask how serious
was avoiding tasklets or using wrappers in the future to avoid irq
context is or is it advised. Do we have a general agreement this is a
good step forward to take? Has anyone made tests or changes on a
specific driver from irq context to process context and proven there
are no significant advantages of using irq context where you would
have expected it?

Wireless in particular should IMHO not require taskets for anything
time sensitive that I can think about except perhaps changing channels
quickly and to do that appropriately also process pending RX frames
prior to a switch. It remains to be seen experimentally whether or not
using a workqueue for RX processing would affect the time to switch
channels negatively but I doubt it would be significant. I hope to
test that with ath9k_htc.

What about gigabit or 10 Gigabit Ethernet drivers ? Do they face any
challenges which would yet need to be proven would not face issues
when processing bottom halves in process context?

[1] http://lwn.net/Articles/239633/

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Stop using tasklets for bottom halves
From: Stephen Hemminger @ 2009-09-08  0:14 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Steven Rostedt, Ingo Molnar, Michael Buesch, John W. Linville,
	linux-wireless, linux-kernel, netdev, Matt Smith, Kevin Hayes,
	Bob Copeland, Jouni Malinen, Ivan Seskar, ic.felix
In-Reply-To: <43e72e890909071558s637b45c7i10807587dc40e8c4@mail.gmail.com>

On Mon, 7 Sep 2009 15:58:50 -0700
"Luis R. Rodriguez" <mcgrof@gmail.com> wrote:

> A while ago I had read about an effort to consider removing tasklets
> [1] or at least trying to not use them. I'm unaware of the progress in
> this respect but since reading that article have always tried to
> evaluate whether or not we need tasklets on wireless drivers. I have
> also wondered whether work in irq context in other parts of the kernel
> can be moved to process context, a curious example being timers. I'll
> personally be trying to using only process context on bottom halves on
> future drivers but I figured it may be a good time to ask how serious
> was avoiding tasklets or using wrappers in the future to avoid irq
> context is or is it advised. Do we have a general agreement this is a
> good step forward to take? Has anyone made tests or changes on a
> specific driver from irq context to process context and proven there
> are no significant advantages of using irq context where you would
> have expected it?
> 
> Wireless in particular should IMHO not require taskets for anything
> time sensitive that I can think about except perhaps changing channels
> quickly and to do that appropriately also process pending RX frames
> prior to a switch. It remains to be seen experimentally whether or not
> using a workqueue for RX processing would affect the time to switch
> channels negatively but I doubt it would be significant. I hope to
> test that with ath9k_htc.
> 
> What about gigabit or 10 Gigabit Ethernet drivers ? Do they face any
> challenges which would yet need to be proven would not face issues
> when processing bottom halves in process context?
> 
> [1] http://lwn.net/Articles/239633/
> 
>   Luis
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Why not use NAPI, which is soft irq? Almost all 1G and 10G drivers
use NAPI.

Process context is too slow.

-- 

^ permalink raw reply

* Re: Stop using tasklets for bottom halves
From: Steven Rostedt @ 2009-09-08  2:17 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Luis R. Rodriguez, Ingo Molnar, Michael Buesch, John W. Linville,
	linux-wireless, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Matt Smith, Kevin Hayes,
	Bob Copeland, Jouni Malinen, Ivan Seskar,
	ic.felix-Re5JQEeQqe8AvxtiuMwx3w
In-Reply-To: <20090907171406.6a4b6116@nehalam>

On Mon, 2009-09-07 at 17:14 -0700, Stephen Hemminger wrote:
> On Mon, 7 Sep 2009 15:58:50 -0700
> "Luis R. Rodriguez" <mcgrof-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> 
> > A while ago I had read about an effort to consider removing tasklets
> > [1] or at least trying to not use them. I'm unaware of the progress in
> > this respect but since reading that article have always tried to
> > evaluate whether or not we need tasklets on wireless drivers. I have
> > also wondered whether work in irq context in other parts of the kernel
> > can be moved to process context, a curious example being timers. I'll
> > personally be trying to using only process context on bottom halves on
> > future drivers but I figured it may be a good time to ask how serious
> > was avoiding tasklets or using wrappers in the future to avoid irq
> > context is or is it advised. Do we have a general agreement this is a
> > good step forward to take? Has anyone made tests or changes on a
> > specific driver from irq context to process context and proven there
> > are no significant advantages of using irq context where you would
> > have expected it?
> > 
> > Wireless in particular should IMHO not require taskets for anything
> > time sensitive that I can think about except perhaps changing channels
> > quickly and to do that appropriately also process pending RX frames
> > prior to a switch. It remains to be seen experimentally whether or not
> > using a workqueue for RX processing would affect the time to switch
> > channels negatively but I doubt it would be significant. I hope to
> > test that with ath9k_htc.
> > 
> > What about gigabit or 10 Gigabit Ethernet drivers ? Do they face any
> > challenges which would yet need to be proven would not face issues
> > when processing bottom halves in process context?
> > 
> > [1] http://lwn.net/Articles/239633/
> > 
> >   Luis
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> Why not use NAPI, which is soft irq? Almost all 1G and 10G drivers
> use NAPI.
> 
> Process context is too slow.

Well, I'm hoping to prove the opposite. I'm working on some stuff that I
plan to present at Linux Plumbers. I've been too distracted by other
things, but hopefully I'll have some good numbers to present by then.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [iproute2] tc action mirred    question
From: jamal @ 2009-09-08  2:16 UTC (permalink / raw)
  To: Xiaofei Wu; +Cc: linux netdev
In-Reply-To: <404956.88336.qm@web111607.mail.gq1.yahoo.com>

On Mon, 2009-09-07 at 09:05 -0700, Xiaofei Wu wrote:

> (1) Could I use  pedit action to modify the dst MAC, so the destination node D will accept it, 
> then forward it to node C?  

Yes, you can achieve it with pedit; 


> (or use other tools to modify the dst MAC, please give me more information)
> 

it is as usable as u32 is - you have to know your offsets
example, here's something done on an incoming packet:
=-=
#Note:
#dst MAC starts at -14
#src MAC at -8
#ethertype at -2
#
tc filter add dev eth1 parent ffff: protocol ip prio 10 u32 \
match ip src 192.168.2.11/32 flowid 1:2 \
action pedit munge offset -14 u16 set 0x0000 \
munge offset -12 u32 set 0x00000200 \
munge offset -8 u32 set 0x0aaf0100 \
munge offset -4 u32 set 0x0008eb06 pipe \
action mirred egress redirect dev eth0
----

> (2) If I use 'ifconfig wlan0 promisc ... ' on node D, would it route the mirroring packets
>  (the dst MAC is incorrect)
> to node C?

It may work.
Go and try running some experiments.

cheers,
jamal


^ permalink raw reply

* Re: Stop using tasklets for bottom halves
From: Luis R. Rodriguez @ 2009-09-08  4:16 UTC (permalink / raw)
  To: rostedt-nx8X9YLhiw1AfugRpC6u6w
  Cc: Stephen Hemminger, Ingo Molnar, Michael Buesch, John W. Linville,
	linux-wireless, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Matt Smith, Kevin Hayes,
	Bob Copeland, Jouni Malinen, Ivan Seskar,
	ic.felix-Re5JQEeQqe8AvxtiuMwx3w
In-Reply-To: <1252376254.21261.2052.camel-f9ZlEuEWxVcI6MkJdU+c8EEOCMrvLtNR@public.gmane.org>

On Mon, Sep 7, 2009 at 7:17 PM, Steven Rostedt<rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org> wrote:
> On Mon, 2009-09-07 at 17:14 -0700, Stephen Hemminger wrote:
>> On Mon, 7 Sep 2009 15:58:50 -0700
>> "Luis R. Rodriguez" <mcgrof-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>
>> > A while ago I had read about an effort to consider removing tasklets
>> > [1] or at least trying to not use them. I'm unaware of the progress in
>> > this respect but since reading that article have always tried to
>> > evaluate whether or not we need tasklets on wireless drivers. I have
>> > also wondered whether work in irq context in other parts of the kernel
>> > can be moved to process context, a curious example being timers. I'll
>> > personally be trying to using only process context on bottom halves on
>> > future drivers but I figured it may be a good time to ask how serious
>> > was avoiding tasklets or using wrappers in the future to avoid irq
>> > context is or is it advised. Do we have a general agreement this is a
>> > good step forward to take? Has anyone made tests or changes on a
>> > specific driver from irq context to process context and proven there
>> > are no significant advantages of using irq context where you would
>> > have expected it?
>> >
>> > Wireless in particular should IMHO not require taskets for anything
>> > time sensitive that I can think about except perhaps changing channels
>> > quickly and to do that appropriately also process pending RX frames
>> > prior to a switch. It remains to be seen experimentally whether or not
>> > using a workqueue for RX processing would affect the time to switch
>> > channels negatively but I doubt it would be significant. I hope to
>> > test that with ath9k_htc.
>> >
>> > What about gigabit or 10 Gigabit Ethernet drivers ? Do they face any
>> > challenges which would yet need to be proven would not face issues
>> > when processing bottom halves in process context?
>> >
>> > [1] http://lwn.net/Articles/239633/
>> >
>> >   Luis
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe netdev" in
>> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> Why not use NAPI, which is soft irq? Almost all 1G and 10G drivers
>> use NAPI.
>>
>> Process context is too slow.
>
> Well, I'm hoping to prove the opposite. I'm working on some stuff that I
> plan to present at Linux Plumbers. I've been too distracted by other
> things, but hopefully I'll have some good numbers to present by then.

What day in specific was this planned for at Plumbers?

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox