* Re: [net-next RFC V5 5/5] virtio_net: support negotiating the number of queues through ctrl vq
From: Jason Wang @ 2012-07-06 3:20 UTC (permalink / raw)
To: Sasha Levin
Cc: krkumar2, habanero, mashirle, kvm, mst, netdev, linux-kernel,
virtualization, edumazet, tahm, jwhan, davem, sri
In-Reply-To: <1341492679.18786.18.camel@lappy>
On 07/05/2012 08:51 PM, Sasha Levin wrote:
> On Thu, 2012-07-05 at 18:29 +0800, Jason Wang wrote:
>> @@ -1387,6 +1404,10 @@ static int virtnet_probe(struct virtio_device *vdev)
>> if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
>> vi->has_cvq = true;
>>
>> + /* Use single tx/rx queue pair as default */
>> + vi->num_queue_pairs = 1;
>> + vi->total_queue_pairs = num_queue_pairs;
> The code is using this "default" even if the amount of queue pairs it
> wants was specified during initialization. This basically limits any
> device to use 1 pair when starting up.
>
Yes, currently the virtio-net driver would use 1 txq/txq by default
since multiqueue may not outperform in all kinds of workload. So it's
better to keep it as default and let user enable multiqueue by ethtool -L.
^ permalink raw reply
* Re: [net-next RFC V5 2/5] virtio_ring: move queue_index to vring_virtqueue
From: Jason Wang @ 2012-07-06 3:17 UTC (permalink / raw)
To: Sasha Levin
Cc: krkumar2, habanero, mashirle, kvm, mst, netdev, linux-kernel,
virtualization, edumazet, tahm, jwhan, davem, sri
In-Reply-To: <1341488454.18786.15.camel@lappy>
On 07/05/2012 07:40 PM, Sasha Levin wrote:
> On Thu, 2012-07-05 at 18:29 +0800, Jason Wang wrote:
>> Instead of storing the queue index in virtio infos, this patch moves them to
>> vring_virtqueue and introduces helpers to set and get the value. This would
>> simplify the management and tracing.
>>
>> Signed-off-by: Jason Wang<jasowang@redhat.com>
> This patch actually fails to compile:
>
> drivers/virtio/virtio_mmio.c: In function ‘vm_notify’:
> drivers/virtio/virtio_mmio.c:229:13: error: ‘struct virtio_mmio_vq_info’ has no member named ‘queue_index’
> drivers/virtio/virtio_mmio.c: In function ‘vm_del_vq’:
> drivers/virtio/virtio_mmio.c:278:13: error: ‘struct virtio_mmio_vq_info’ has no member named ‘queue_index’
> make[2]: *** [drivers/virtio/virtio_mmio.o] Error 1
>
> It probably misses the following hunks:
>
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index f5432b6..12b6180 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -222,11 +222,10 @@ static void vm_reset(struct virtio_device *vdev)
> static void vm_notify(struct virtqueue *vq)
> {
> struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vq->vdev);
> - struct virtio_mmio_vq_info *info = vq->priv;
>
> /* We write the queue's selector into the notification register to
> * signal the other end */
> - writel(info->queue_index, vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);
> + writel(virtqueue_get_queue_index(vq), vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);
> }
>
> /* Notify all virtqueues on an interrupt. */
> @@ -275,7 +274,7 @@ static void vm_del_vq(struct virtqueue *vq)
> vring_del_virtqueue(vq);
>
> /* Select and deactivate the queue */
> - writel(info->queue_index, vm_dev->base + VIRTIO_MMIO_QUEUE_SEL);
> + writel(virtqueue_get_queue_index(vq), vm_dev->base + VIRTIO_MMIO_QUEUE_SEL);
> writel(0, vm_dev->base + VIRTIO_MMIO_QUEUE_PFN);
>
> size = PAGE_ALIGN(vring_size(info->num, VIRTIO_MMIO_VRING_ALIGN));
>
Oops, I miss the virtio mmio part, thanks for pointing this.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next] asix: avoid copies in tx path
From: Ming Lei @ 2012-07-06 1:16 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Miller, netdev, Greg Kroah-Hartman, Allan Chou,
Trond Wuellner, Grant Grundler
In-Reply-To: <1341498661.2583.4162.camel@edumazet-glaptop>
On Thu, Jul 5, 2012 at 10:31 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> I noticed excess calls to skb_copy_expand() or memmove() in asix driver.
>
> This driver needs to push 4 bytes in front of frame (packet_len)
> and maybe add 4 bytes after the end (if padlen is 4)
>
> So it should set needed_headroom & needed_tailroom to avoid
> copies. But its not enough, because many packets are cloned
> before entering asix_tx_fixup() and this driver use skb_cloned()
> as a lazy way to check if it can push and put additional bytes in frame.
>
> Avoid skb_copy_expand() expensive call, using following rules :
>
> - We are allowed to push 4 bytes in headroom if skb_header_cloned()
> is false (and if we have 4 bytes of headroom)
>
> - We are allowed to put 4 bytes at tail if skb_cloned()
> is false (and if we have 4 bytes of tailroom)
>
> TCP packets for example are cloned, but skb_header_release()
> was called in tcp stack, allowing us to use headroom for our needs.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Allan Chou <allan@asix.com.tw>
> Cc: Trond Wuellner <trond@chromium.org>
> Cc: Grant Grundler <grundler@chromium.org>
> Cc: Paul Stewart <pstew@chromium.org>
> Cc: Ming Lei <tom.leiming@gmail.com>
After testing the patch on beagle-xm with external DLINK DUB-E100 NIC,
the transmit performance is increased from ~75Mbps to ~91Mbps when
DEBUG_SLAB is enabled, follows the test command and result:
[root@root]#iperf -c 192.168.0.103 -w 131072 -t 10
------------------------------------------------------------
Client connecting to 192.168.0.103, TCP port 5001
TCP window size: 256 KByte (WARNING: requested 128 KByte)
------------------------------------------------------------
[ 3] local 192.168.0.102 port 57888 connected with 192.168.0.103 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 109 MBytes 91.6 Mbits/sec
Tested-by: Ming Lei <ming.lei@canonical.com>
Thanks,
--
Ming Lei
^ permalink raw reply
* Re: TCP transmit performance regression
From: Ming Lei @ 2012-07-06 0:45 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Network Development, David Miller
In-Reply-To: <1341500196.2583.4222.camel@edumazet-glaptop>
On Thu, Jul 5, 2012 at 10:56 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2012-07-05 at 22:01 +0800, Ming Lei wrote:
>
>> At default SMSC95xx turbo mode is true, rx buffer will be very big
>> (17.5K). Or the large rx buffer size puts limit on concurrent URBs/SKBs
>> count. Both two may cause the problem.
>
> I see. So we should try to recycle these large rx buffers in usbnet
> instead of allocating/freeing them for each incoming packet.
>
> Following patch does the copybreak of all incoming frames.
>
> It has nice property of not lying anymore on skb truesize ;)
>
> It should be applied on both sender and receiver
In fact, I run the below command in the test beagle-xm box with SMSC95xx
NIC:
iperf -c 192.168.0.103 -w 131072 -t 10
and run the below command in one x86 production machine(e1000e NIC)
running ubuntu 12.04:
iperf -s -w 131072
The current problem is that the transmit performance on beagle-xm is
not good with the above iperf test if DEBUG_SLAB is enabled. But if
I set dev->rx_usb_size as 2048, the transmit performance can be
doubled, looks it is caused by the large rx buffer.
>
> drivers/net/usb/smsc95xx.c | 19 +++----------------
> 1 file changed, 3 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
> index b1112e7..3d9566f 100644
> --- a/drivers/net/usb/smsc95xx.c
> +++ b/drivers/net/usb/smsc95xx.c
> @@ -1080,30 +1080,17 @@ static int smsc95xx_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
> return 0;
> }
>
> - /* last frame in this batch */
> - if (skb->len == size) {
> - if (dev->net->features & NETIF_F_RXCSUM)
> - smsc95xx_rx_csum_offload(skb);
> - skb_trim(skb, skb->len - 4); /* remove fcs */
> - skb->truesize = size + sizeof(struct sk_buff);
> -
> - return 1;
> - }
> -
> - ax_skb = skb_clone(skb, GFP_ATOMIC);
> + ax_skb = netdev_alloc_skb_ip_align(dev->net, size);
> if (unlikely(!ax_skb)) {
> netdev_warn(dev->net, "Error allocating skb\n");
> return 0;
> }
>
> - ax_skb->len = size;
> - ax_skb->data = packet;
> - skb_set_tail_pointer(ax_skb, size);
> + memcpy(skb_put(ax_skb, size), packet, size);
>
> if (dev->net->features & NETIF_F_RXCSUM)
> smsc95xx_rx_csum_offload(ax_skb);
> - skb_trim(ax_skb, ax_skb->len - 4); /* remove fcs */
> - ax_skb->truesize = size + sizeof(struct sk_buff);
> + __skb_trim(ax_skb, ax_skb->len - 4); /* remove fcs */
>
> usbnet_skb_return(dev, ax_skb);
> }
>
>
Unfortunately, the patch still hasn't any improvement on the transmit
performance of beagle-xm.
Thanks,
--
Ming Lei
^ permalink raw reply
* Re: Network namespace and bonding WARNING at fs/proc/generic.c remove_proc_entry
From: Eric W. Biederman @ 2012-07-06 0:41 UTC (permalink / raw)
To: Serge E. Hallyn; +Cc: Dilip Daya, linux-kernel, containers, netdev
In-Reply-To: <20120705220749.GA11255@mail.hallyn.com>
"Serge E. Hallyn" <serge@hallyn.com> writes:
> Quoting Dilip Daya (dilip.daya@hp.com):
>> Hi,
>>
>> I'd discussed the following with Serge Hallyn.
>>
>> => Environment based on 3.2.18 / x86_64 kernel.
>> => WARNING: at fs/proc/generic.c:808 remove_proc_entry+0xdb/0x21f()
>> => WARNING: at fs/proc/generic.c:849 remove_proc_entry+0x208/0x21f()
>
> Hi,
>
> thanks much for sending this. I'm still getting this error on
> 3.5.0-2-generic (today's ubuntu quantal kernel)
>
>> network namespace and bonding
>> -----------------------------
>>
>> * Migrate two phy nics from host to netns (netns0).
>> - ip link set ethX netns netns0
>>
>> * In host environment:
>> - load bonding module, /sbin/modprobe -v bonding mode=1 miimon=100
>> - /sys/class/net/bond0 exists.
>> - /proc/net/bonding/bond0 exists.
>> - /sys/class/net/bonding_masters has bond0.
>>
>> * Migrate bond0 to netns (netns0):
>> - ip link set bond0 netns netns0.
>>
>> * Within netns (netns0):
>> - /sys/class/net/bonding_masters is empty.
>> - /sys/class/net/bond0 exist.
>> - configure bond0 and ifenslave with two phy nics.
>> - /proc/net/bonding/bond0 does not exist within netns0, but does
>> exist in the host environment.
>> - /sys/class/net/bonding_masters is empty.
>
> mine is not empty, fwiw. However
>
>> - ping to remote end of bond0 works.
>>
>> * Within netns (netns0), flushing ethX and bondY:
>> - down bond0 and its phy nic interfaces:
>> - ip link set ... down
>> - ip addr flush dev [bond0 | eth#]
>> - deleting bond0, /sbin/ip link del dev bond0
>
> Yup I still get a remove_proc_entry WARNING at fs/proc/generic.c:808,
> which is the warning when (!de)
It looks like Dilip is running an old kernel. There should have been
some version of /sys/class/net/bonding_masters in every network
namespace since sometime in 2009.
>From the warning it looks like the proc files are being added/removed
to the wrong network namespace. So in one namespace we get an error
when we delete the moved device and in the other network namespace
we get an error when we remove the /proc/directory.
An old kernel without proper network namespace support is the only
reason I can imagine someone would be moving an existing bond device
between network namespaces.
If there are other reasons for wanting to move a bonding device between
network namespaces it is possible to catch the NETDEV_UNREGISTER and
NETDEV_REGISTER events to remove/add the per device proc files at the
appropriate time.
However since moving bonding devices appears to be an unneded operation
let's just do things simply and forbid moving bonding devices between
network namespaces. Serge, Dilip can you two test the patch below
and see if it fixes the warnings.
Eric
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 2ee8cf9..818ed64 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4345,6 +4345,9 @@ static void bond_setup(struct net_device *bond_dev)
bond_dev->priv_flags |= IFF_BONDING;
bond_dev->priv_flags &= ~(IFF_XMIT_DST_RELEASE | IFF_TX_SKB_SHARING);
+ /* Don't allow bond devices to change network namespaces. */
+ bond_dev->features |= NETIF_F_LOCAL;
+
/* At first, we block adding VLANs. That's the only way to
* prevent problems that occur when adding VLANs over an
* empty bond. The block will be removed once non-challenged
^ permalink raw reply related
* Re: BISECTED: Re: REGRESSION: 3.4.0->3.5.0-rc2 kernel WARNING on cable plug on Acer Aspire One, no network
From: Alex Villacís Lasso @ 2012-07-06 0:35 UTC (permalink / raw)
To: Marek Szyprowski; +Cc: 'Francois Romieu', netdev
In-Reply-To: <012601cd5a7b$886fd4c0$994f7e40$%szyprowski@samsung.com>
El 05/07/12 01:58, Marek Szyprowski escribió:
> Hello,
>
> On Thursday, July 05, 2012 6:15 AM Alex Villacís Lasso wrote:
>
>> El 04/07/12 02:02, Marek Szyprowski escribió:
>>> Hello,
>>>
>>> On Tuesday, July 03, 2012 4:27 PM Alex Villací¬s Lasso wrote:
>>>
>>>> El 03/07/12 00:40, Marek Szyprowski escribió:
>>>>> Hi Alex,
>>>>>
>>>>> On Tuesday, July 03, 2012 4:45 AM Alex Villacís Lasso wrote:
>>>>>
>>>>>> -------- Mensaje original --------
>>>>>> Asunto: BISECTED: Re: REGRESSION: 3.4.0->3.5.0-rc2 kernel WARNING on cable
>>>>>> plug on Acer Aspire One, no network Fecha: Mon, 02 Jul 2012 21:33:41 -0500 De:
>>>>>> Alex Villacís Lasso <a_villacis@palosanto.com> Para: Francois Romieu
>>>>>> <romieu@fr.zoreil.com> CC: netdev@vger.kernel.org
>>>>>> El 01/07/12 08:50, Alex Villacís Lasso escribió:
>>>>>>> El 11/06/12 16:38, Francois Romieu escribió:
>>>>>>>> Alex Villacís Lasso <a_villacis@palosanto.com> :
>>>>>>>> [...]
>>>>>>>>> $ grep XID dmesg-3.5.0-rc2.txt
>>>>>>>>> [ 15.873858] r8169 0000:02:00.0: eth0: RTL8102e at 0xf7c0e000,
>>>>>>>>> 00:1e:68:e5:5d:b1, XID 04a00000 IRQ 44
>>>>>>>> The 8102e has not been touched by that many suspect patches but I do
>>>>>>>> not see where the problem is :o(
>>>>>>>>
>>>>>>>> Can you peel off the r8169 patches between 3.4.0 and 3.5-rc ?
>>>>>>>>
>>>>>>> Still present in 3.5-rc5. Bisection still in progress.
>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>> My full bisection points to this commit:
>>>>>>
>>>>>> commit 0a2b9a6ea93650b8a00f9fd5ee8fdd25671e2df6
>>>>>> Author: Marek Szyprowski <m.szyprowski@samsung.com>
>>>>>> Date: Thu Dec 29 13:09:51 2011 +0100
>>>>>>
>>>>>> X86: integrate CMA with DMA-mapping subsystem
>>>>>>
>>>>>> This patch adds support for CMA to dma-mapping subsystem for x86
>>>>>> architecture that uses common pci-dma/pci-nommu implementation. This
>>>>>> allows to test CMA on KVM/QEMU and a lot of common x86 boxes.
>>>>>>
>>>>>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>>>> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>>>>>> CC: Michal Nazarewicz <mina86@mina86.com>
>>>>>> Acked-by: Arnd Bergmann <arnd@arndb.de>
>>>>>>
>>>>>> Is this commit somehow messing with the network card DMA?
>>>>> This commit in fact touches DMA-mapping subsystem and introduces a bug,
>>>>> which has been finally fixed by commit c080e26edc3a2a3 merged to v3.5-rc3.
>>>>> After applying it the DMA-mapping subsystem should work exactly the same was
>>>>> as in v3.4. Could you please check if it fixes this issue?
>>>>>
>>>>> Best regards
>>>> No. It still fails in 3.5-rc5, as mentioned before.
>>> Hmm. I was a bit confused, because both the subject and git bisect log pointed to v3.5-rc2,
>>> which had that bug. Maybe there is one some other issue present in v3.5-rc5 not related to
>>> my patches?
>>>
>>> Could you check with v3.5-rc5 if reverting patch c080e26edc3a2a3cdfa4c430c663ee1c3bbd8fae
>>> and 0a2b9a6ea93650b8a00f9fd5ee8fdd25671e2df6 fixes the problems with rtl driver?
>>>
>>> Best regards
>> Reverting the two patches indeed fixes the bug on -rc5.
> That's really strange. Could you check if you have CMA disabled in the config? After preparing
> a c080e26edc3a2a3cdfa4c430c663ee1c3bbd8fae fixup patch, I was really convinced that there are
> no functional changes in x86 dma mapping code when CMA is disabled. I will provide some
> patches to revert different parts of my changes, so we will find which line causes issues.
>
> Best regards
The affected system is an Acer Aspire One, a 32-bit only system. The
option to enable or disable CMA simply does not appear as available in
menuconfig to either enable or disable, and it also does not appear in
the .config file as either set or unset. I assume this means that CMA is
disabled.
^ permalink raw reply
* [PATCH net] cnic: Don't use netdev->base_addr
From: Michael Chan @ 2012-07-06 0:21 UTC (permalink / raw)
To: davem; +Cc: netdev
commit c0357e975afdbbedab5c662d19bef865f02adc17
bnx2: stop using net_device.{base_addr, irq}.
removed netdev->base_addr so we need to update cnic to get the MMIO
base address from pci_resource_start(). Otherwise, mmap of the uio
device will fail.
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
drivers/net/ethernet/broadcom/cnic.c | 7 +++++--
1 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/cnic.c b/drivers/net/ethernet/broadcom/cnic.c
index c95e7b5..3c95065 100644
--- a/drivers/net/ethernet/broadcom/cnic.c
+++ b/drivers/net/ethernet/broadcom/cnic.c
@@ -1053,12 +1053,13 @@ static int cnic_init_uio(struct cnic_dev *dev)
uinfo = &udev->cnic_uinfo;
- uinfo->mem[0].addr = dev->netdev->base_addr;
+ uinfo->mem[0].addr = pci_resource_start(dev->pcidev, 0);
uinfo->mem[0].internal_addr = dev->regview;
- uinfo->mem[0].size = dev->netdev->mem_end - dev->netdev->mem_start;
uinfo->mem[0].memtype = UIO_MEM_PHYS;
if (test_bit(CNIC_F_BNX2_CLASS, &dev->flags)) {
+ uinfo->mem[0].size = MB_GET_CID_ADDR(TX_TSS_CID +
+ TX_MAX_TSS_RINGS + 1);
uinfo->mem[1].addr = (unsigned long) cp->status_blk.gen &
PAGE_MASK;
if (cp->ethdev->drv_state & CNIC_DRV_STATE_USING_MSIX)
@@ -1068,6 +1069,8 @@ static int cnic_init_uio(struct cnic_dev *dev)
uinfo->name = "bnx2_cnic";
} else if (test_bit(CNIC_F_BNX2X_CLASS, &dev->flags)) {
+ uinfo->mem[0].size = pci_resource_len(dev->pcidev, 0);
+
uinfo->mem[1].addr = (unsigned long) cp->bnx2x_def_status_blk &
PAGE_MASK;
uinfo->mem[1].size = sizeof(*cp->bnx2x_def_status_blk);
--
1.7.1
^ permalink raw reply related
* Re: [PATCH net-next] cnic: Fix mmap regression.
From: Michael Chan @ 2012-07-05 23:34 UTC (permalink / raw)
To: David Miller; +Cc: netdev
In-Reply-To: <20120705.153638.790030674286651971.davem@davemloft.net>
On Thu, 2012-07-05 at 15:36 -0700, David Miller wrote:
> From: "Michael Chan" <mchan@broadcom.com>
> Date: Thu, 5 Jul 2012 14:59:46 -0700
>
> > Or you want me to send you the equivalent patches for net.
>
> Please do so.
>
OK. I'll send you one patch to fix it in net, instead of one that
causes regression, and another one to fix it.
^ permalink raw reply
* Re: [PATCH] force dentry revalidation after namespace change
From: Eric W. Biederman @ 2012-07-05 23:31 UTC (permalink / raw)
To: Glauber Costa
Cc: linux-kernel, netdev, Andrew Morton, Tejun Heo,
Greg Kroah-Hartman
In-Reply-To: <1341496805-26394-1-git-send-email-glommer@parallels.com>
Glauber Costa <glommer@parallels.com> writes:
> When we change the namespace tag of a sysfs entry, the associated dentry
> is still kept around. readdir() will work correctly and not display the
> old entries, but open() will still succeed, so will reads and writes.
>
> This will no longer happen if sysfs is remounted, hinting that this is a
> cache-related problem.
Equalivalently to remounting you can do
echo 3 > /proc/sys/vm/drop_caches.
> I am using the following sequence to demonstrate that:
>
> shell1:
> ip link add type veth
> unshare -nm
>
> shell2:
> ip link set veth1 <pid_of_shell_1>
> cat /sys/devices/virtual/net/veth1/ifindex
>
> Before that patch, this will succeed (fail to fail). After it, it will
> correctly return an error. Differently from a normal rename, which we
> handle fine, changing the object namespace will keep it's path intact.
> So this check seems necessary as well.
Overall good bug spotting, and good spotting of where the fix should
live.
Your summary should have said:
[PATCH] fail dentry revalidation after namespace change
And you have the test slightly wrong below.
> Signed-off-by: Glauber Costa <glommer@parallels.com>
> CC: Tejun Heo <tj@kernel.org>
> CC: Eric W. Biederman <ebiederm@xmission.com>
> CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> ---
> fs/sysfs/dir.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
> index e6bb9b2..c24bdd9 100644
> --- a/fs/sysfs/dir.c
> +++ b/fs/sysfs/dir.c
> @@ -307,6 +307,7 @@ static int sysfs_dentry_revalidate(struct dentry *dentry, struct nameidata *nd)
> {
> struct sysfs_dirent *sd;
> int is_dir;
> + int type;
>
> if (nd->flags & LOOKUP_RCU)
> return -ECHILD;
> @@ -314,6 +315,10 @@ static int sysfs_dentry_revalidate(struct dentry *dentry, struct nameidata *nd)
> sd = dentry->d_fsdata;
> mutex_lock(&sysfs_mutex);
>
> + type = sysfs_ns_type(sd);
> + if (sd->s_ns && (sysfs_info(dentry->d_sb)->ns[type] != sd->s_ns))
> + goto out_bad;
> +
First this check should be down below with after the other rename
checks.
Second the test should be:
type = KOBJ_NS_TYPE_NONE;
if (sd->s_parent)
type = sysfs_ns_type(sd->s_parent);
if (type && (sysfs_info(dentry->d_sb)->ns[type] != sd->s_ns))
goto out_bad;
The important difference there it is the directory that the dirent is
in that the type comes from. Not the dirent itself.
> /* The sysfs dirent has been deleted */
> if (sd->s_flags & SYSFS_FLAG_REMOVED)
> goto out_bad;
Glauber. Do you think you can fix your patch and resubmit.
Eric
^ permalink raw reply
* Re: [iproute2] display vlan configuration
From: John Fastabend @ 2012-07-05 23:20 UTC (permalink / raw)
To: Fabien C.; +Cc: netdev
In-Reply-To: <4FF61DE9.7000507@jetable.org>
On 7/5/2012 4:06 PM, Fabien C. wrote:
> Hello,
>
> it looks like there is no way to show the vlan configuration with iproute (nor with any other tool apparently).
>
> This can lead to trouble since :
> # ip link add link eth0 name eth2.333 type vlan id 444
>
> will create an interface that will show up like this with "ip link show" :
> 51: eth2.333@eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
>
> The only hint we have is the interface name, which may not be related to the vlan id we set earlier.
Here you need to show the details,
#ip -d link show dev eth2.333
From my current setup,
# ip -d link show dev vlan0
33: vlan0@eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
link/ether 00:1b:21:55:23:59 brd ff:ff:ff:ff:ff:ff
vlan id 101 <REORDER_HDR>
^ permalink raw reply
* [iproute2] display vlan configuration
From: Fabien C. @ 2012-07-05 23:06 UTC (permalink / raw)
To: netdev
Hello,
it looks like there is no way to show the vlan configuration with iproute (nor with any other tool apparently).
This can lead to trouble since :
# ip link add link eth0 name eth2.333 type vlan id 444
will create an interface that will show up like this with "ip link show" :
51: eth2.333@eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
The only hint we have is the interface name, which may not be related to the vlan id we set earlier.
Is there any way to get that information?
Thanks,
Fabien
^ permalink raw reply
* Re: [B.A.T.M.A.N.] [PATCH net] Bug fix for batman-adv 2012-07-06
From: Antonio Quartulli @ 2012-07-05 22:51 UTC (permalink / raw)
To: davem; +Cc: netdev, b.a.t.m.a.n
In-Reply-To: <1341528514-27906-1-git-send-email-ordex@autistici.org>
[-- Attachment #1: Type: text/plain, Size: 4380 bytes --]
On Fri, Jul 06, 2012 at 12:48:33 +0200, Antonio Quartulli wrote:
> here I have a fix intended for net/linux-3.5.
...
Hello David,
here you have our instructions to resolve the conflicts that you will hit while
merging net into net-next:
Conflict 1 (bridge_loop_avoidance.c):
<<<<<<<
int batadv_bla_rx(struct batadv_priv *bat_priv, struct sk_buff *skb, short vid)
=======
int bla_rx(struct bat_priv *bat_priv, struct sk_buff *skb, short vid,
bool is_bcast)
>>>>>>>
resolves to:
int batadv_bla_rx(struct batadv_priv *bat_priv, struct sk_buff *skb, short vid,
bool is_bcast)
Conflict 2 (bridge_loop_avoidance.h):
<<<<<<<
int batadv_bla_rx(struct batadv_priv *bat_priv, struct sk_buff *skb, short vid);
int batadv_bla_tx(struct batadv_priv *bat_priv, struct sk_buff *skb, short vid);
int batadv_bla_is_backbone_gw(struct sk_buff *skb,
struct batadv_orig_node *orig_node, int hdr_size);
int batadv_bla_claim_table_seq_print_text(struct seq_file *seq, void *offset);
int batadv_bla_is_backbone_gw_orig(struct batadv_priv *bat_priv, uint8_t *orig);
int batadv_bla_check_bcast_duplist(struct batadv_priv *bat_priv,
struct batadv_bcast_packet *bcast_packet,
int hdr_size);
void batadv_bla_update_orig_address(struct batadv_priv *bat_priv,
struct batadv_hard_iface *primary_if,
struct batadv_hard_iface *oldif);
int batadv_bla_init(struct batadv_priv *bat_priv);
void batadv_bla_free(struct batadv_priv *bat_priv);
=======
int bla_rx(struct bat_priv *bat_priv, struct sk_buff *skb, short vid,
bool is_bcast);
int bla_tx(struct bat_priv *bat_priv, struct sk_buff *skb, short vid);
int bla_is_backbone_gw(struct sk_buff *skb,
struct orig_node *orig_node, int hdr_size);
int bla_claim_table_seq_print_text(struct seq_file *seq, void *offset);
int bla_is_backbone_gw_orig(struct bat_priv *bat_priv, uint8_t *orig);
int bla_check_bcast_duplist(struct bat_priv *bat_priv,
struct bcast_packet *bcast_packet, int hdr_size);
void bla_update_orig_address(struct bat_priv *bat_priv,
struct hard_iface *primary_if,
struct hard_iface *oldif);
int bla_init(struct bat_priv *bat_priv);
void bla_free(struct bat_priv *bat_priv);
>>>>>>>
resolves to:
int batadv_bla_rx(struct batadv_priv *bat_priv, struct sk_buff *skb, short vid,
bool is_bcast);
int batadv_bla_tx(struct batadv_priv *bat_priv, struct sk_buff *skb, short vid);
int batadv_bla_is_backbone_gw(struct sk_buff *skb,
struct batadv_orig_node *orig_node, int hdr_size);
int batadv_bla_claim_table_seq_print_text(struct seq_file *seq, void *offset);
int batadv_bla_is_backbone_gw_orig(struct batadv_priv *bat_priv, uint8_t *orig);
int batadv_bla_check_bcast_duplist(struct batadv_priv *bat_priv,
struct batadv_bcast_packet *bcast_packet,
int hdr_size);
void batadv_bla_update_orig_address(struct batadv_priv *bat_priv,
struct batadv_hard_iface *primary_if,
struct batadv_hard_iface *oldif);
int batadv_bla_init(struct batadv_priv *bat_priv);
void batadv_bla_free(struct batadv_priv *bat_priv);
Conflict 3 (bridge_loop_avoidance.h):
<<<<<<<
static inline int batadv_bla_rx(struct batadv_priv *bat_priv,
struct sk_buff *skb, short vid)
=======
static inline int bla_rx(struct bat_priv *bat_priv, struct sk_buff *skb,
short vid, bool is_bcast)
>>>>>>>
resolves to:
static inline int batadv_bla_rx(struct batadv_priv *bat_priv,
struct sk_buff *skb, short vid, bool is_bcast)
Conflict 4 (soft-interface.c):
<<<<<<<
__be16 ethertype = __constant_htons(BATADV_ETH_P_BATMAN);
=======
bool is_bcast;
is_bcast = (batadv_header->packet_type == BAT_BCAST);
>>>>>>>
resolves to:
bool is_bcast;
__be16 ethertype = __constant_htons(BATADV_ETH_P_BATMAN);
is_bcast = (batadv_header->packet_type == BATADV_BCAST);
Conflict 5 (soft-interface.c):
<<<<<<<
if (batadv_bla_rx(bat_priv, skb, vid))
=======
if (bla_rx(bat_priv, skb, vid, is_bcast))
>>>>>>>
resolves to:
if (batadv_bla_rx(bat_priv, skb, vid, is_bcast))
Wrong merge by git (soft-interface.c):
line 270 must look like this:
struct batadv_header *batadv_header = (struct batadv_header *)skb->data;
--
Antonio Quartulli
..each of us alone is worth nothing..
Ernesto "Che" Guevara
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply
* [PATCH net] batman-adv: check incoming packet type for bla
From: Antonio Quartulli @ 2012-07-05 22:48 UTC (permalink / raw)
To: davem; +Cc: netdev, b.a.t.m.a.n, Simon Wunderlich, Simon Wunderlich
In-Reply-To: <1341528514-27906-1-git-send-email-ordex@autistici.org>
From: Simon Wunderlich <simon.wunderlich@s2003.tu-chemnitz.de>
If the gateway functionality is used, some broadcast packets (DHCP
requests) may be transmitted as unicast packets. As the bridge loop
avoidance code now only considers the payload Ethernet destination,
it may drop the DHCP request for clients which are claimed by other
backbone gateways, because it falsely infers from the broadcast address
that the right backbone gateway should havehandled the broadcast.
Fix this by checking and delegating the batman-adv packet type used
for transmission.
Reported-by: Guido Iribarren <guidoiribarren@buenosaireslibre.org>
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
---
net/batman-adv/bridge_loop_avoidance.c | 15 +++++++++++----
net/batman-adv/bridge_loop_avoidance.h | 5 +++--
net/batman-adv/soft-interface.c | 6 +++++-
3 files changed, 19 insertions(+), 7 deletions(-)
diff --git a/net/batman-adv/bridge_loop_avoidance.c b/net/batman-adv/bridge_loop_avoidance.c
index 8bf9751..c5863f4 100644
--- a/net/batman-adv/bridge_loop_avoidance.c
+++ b/net/batman-adv/bridge_loop_avoidance.c
@@ -1351,6 +1351,7 @@ void bla_free(struct bat_priv *bat_priv)
* @bat_priv: the bat priv with all the soft interface information
* @skb: the frame to be checked
* @vid: the VLAN ID of the frame
+ * @is_bcast: the packet came in a broadcast packet type.
*
* bla_rx avoidance checks if:
* * we have to race for a claim
@@ -1361,7 +1362,8 @@ void bla_free(struct bat_priv *bat_priv)
* process the skb.
*
*/
-int bla_rx(struct bat_priv *bat_priv, struct sk_buff *skb, short vid)
+int bla_rx(struct bat_priv *bat_priv, struct sk_buff *skb, short vid,
+ bool is_bcast)
{
struct ethhdr *ethhdr;
struct claim search_claim, *claim = NULL;
@@ -1380,7 +1382,7 @@ int bla_rx(struct bat_priv *bat_priv, struct sk_buff *skb, short vid)
if (unlikely(atomic_read(&bat_priv->bla_num_requests)))
/* don't allow broadcasts while requests are in flight */
- if (is_multicast_ether_addr(ethhdr->h_dest))
+ if (is_multicast_ether_addr(ethhdr->h_dest) && is_bcast)
goto handled;
memcpy(search_claim.addr, ethhdr->h_source, ETH_ALEN);
@@ -1406,8 +1408,13 @@ int bla_rx(struct bat_priv *bat_priv, struct sk_buff *skb, short vid)
}
/* if it is a broadcast ... */
- if (is_multicast_ether_addr(ethhdr->h_dest)) {
- /* ... drop it. the responsible gateway is in charge. */
+ if (is_multicast_ether_addr(ethhdr->h_dest) && is_bcast) {
+ /* ... drop it. the responsible gateway is in charge.
+ *
+ * We need to check is_bcast because with the gateway
+ * feature, broadcasts (like DHCP requests) may be sent
+ * using a unicast packet type.
+ */
goto handled;
} else {
/* seems the client considers us as its best gateway.
diff --git a/net/batman-adv/bridge_loop_avoidance.h b/net/batman-adv/bridge_loop_avoidance.h
index e39f93a..dc5227b 100644
--- a/net/batman-adv/bridge_loop_avoidance.h
+++ b/net/batman-adv/bridge_loop_avoidance.h
@@ -23,7 +23,8 @@
#define _NET_BATMAN_ADV_BLA_H_
#ifdef CONFIG_BATMAN_ADV_BLA
-int bla_rx(struct bat_priv *bat_priv, struct sk_buff *skb, short vid);
+int bla_rx(struct bat_priv *bat_priv, struct sk_buff *skb, short vid,
+ bool is_bcast);
int bla_tx(struct bat_priv *bat_priv, struct sk_buff *skb, short vid);
int bla_is_backbone_gw(struct sk_buff *skb,
struct orig_node *orig_node, int hdr_size);
@@ -41,7 +42,7 @@ void bla_free(struct bat_priv *bat_priv);
#else /* ifdef CONFIG_BATMAN_ADV_BLA */
static inline int bla_rx(struct bat_priv *bat_priv, struct sk_buff *skb,
- short vid)
+ short vid, bool is_bcast)
{
return 0;
}
diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
index 6e2530b..a0ec0e4 100644
--- a/net/batman-adv/soft-interface.c
+++ b/net/batman-adv/soft-interface.c
@@ -256,7 +256,11 @@ void interface_rx(struct net_device *soft_iface,
struct bat_priv *bat_priv = netdev_priv(soft_iface);
struct ethhdr *ethhdr;
struct vlan_ethhdr *vhdr;
+ struct batman_header *batadv_header = (struct batman_header *)skb->data;
short vid __maybe_unused = -1;
+ bool is_bcast;
+
+ is_bcast = (batadv_header->packet_type == BAT_BCAST);
/* check if enough space is available for pulling, and pull */
if (!pskb_may_pull(skb, hdr_size))
@@ -302,7 +306,7 @@ void interface_rx(struct net_device *soft_iface,
/* Let the bridge loop avoidance check the packet. If will
* not handle it, we can safely push it up.
*/
- if (bla_rx(bat_priv, skb, vid))
+ if (bla_rx(bat_priv, skb, vid, is_bcast))
goto out;
netif_rx(skb);
--
1.7.9.4
^ permalink raw reply related
* [PATCH net] Bug fix for batman-adv 2012-07-06
From: Antonio Quartulli @ 2012-07-05 22:48 UTC (permalink / raw)
To: davem; +Cc: netdev, b.a.t.m.a.n
here I have a fix intended for net/linux-3.5.
The bug, discovered by Guido Iribarren and fixed by Simon Wunderlich, is caused
by the wrong interaction between the Bridge Loop Avoidance and the Gateway
feature of batman-adv.
Let me know if there are problems.
Thank you,
Antonio
The following changes since commit 9e85a6f9dc231f3ed3c1dc1b12217505d970142a:
Merge tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mturquette/linux (2012-07-03 18:06:49 -0700)
are available in the git repository at:
git://git.open-mesh.org/linux-merge.git tags/batman-adv-fix-for-davem
for you to fetch changes up to 2d3f6ccc4ea5c74d4b4af1b47c56b4cff4bbfcb7:
batman-adv: check incoming packet type for bla (2012-07-06 00:08:46 +0200)
----------------------------------------------------------------
Included changes:
- fix a bug generated by the wrong interaction between the GW feature and the
Bridge Loop Avoidance
----------------------------------------------------------------
Simon Wunderlich (1):
batman-adv: check incoming packet type for bla
net/batman-adv/bridge_loop_avoidance.c | 15 +++++++++++----
net/batman-adv/bridge_loop_avoidance.h | 5 +++--
net/batman-adv/soft-interface.c | 6 +++++-
3 files changed, 19 insertions(+), 7 deletions(-)
^ permalink raw reply
* Re: [PATCH net-next] cnic: Fix mmap regression.
From: David Miller @ 2012-07-05 22:36 UTC (permalink / raw)
To: mchan; +Cc: netdev
In-Reply-To: <1341525586.7472.25.camel@LTIRV-MCHAN1.corp.ad.broadcom.com>
From: "Michael Chan" <mchan@broadcom.com>
Date: Thu, 5 Jul 2012 14:59:46 -0700
> Or you want me to send you the equivalent patches for net.
Please do so.
^ permalink raw reply
* Re: [PATCH] force dentry revalidation after namespace change
From: Serge E. Hallyn @ 2012-07-05 22:17 UTC (permalink / raw)
To: Glauber Costa
Cc: linux-kernel, netdev, Andrew Morton, Tejun Heo, Eric W. Biederman,
Greg Kroah-Hartman
In-Reply-To: <1341496805-26394-1-git-send-email-glommer@parallels.com>
Quoting Glauber Costa (glommer@parallels.com):
> When we change the namespace tag of a sysfs entry, the associated dentry
> is still kept around. readdir() will work correctly and not display the
> old entries, but open() will still succeed, so will reads and writes.
>
> This will no longer happen if sysfs is remounted, hinting that this is a
> cache-related problem.
>
> I am using the following sequence to demonstrate that:
>
> shell1:
> ip link add type veth
> unshare -nm
>
> shell2:
> ip link set veth1 <pid_of_shell_1>
> cat /sys/devices/virtual/net/veth1/ifindex
>
> Before that patch, this will succeed (fail to fail). After it, it will
Confirmed that it currently fails to fail :)
> correctly return an error. Differently from a normal rename, which we
> handle fine, changing the object namespace will keep it's path intact.
> So this check seems necessary as well.
>
> Signed-off-by: Glauber Costa <glommer@parallels.com>
Haven't run it, but the patch looks good. Thanks, Glauber.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
> CC: Tejun Heo <tj@kernel.org>
> CC: Eric W. Biederman <ebiederm@xmission.com>
> CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> ---
> fs/sysfs/dir.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
> index e6bb9b2..c24bdd9 100644
> --- a/fs/sysfs/dir.c
> +++ b/fs/sysfs/dir.c
> @@ -307,6 +307,7 @@ static int sysfs_dentry_revalidate(struct dentry *dentry, struct nameidata *nd)
> {
> struct sysfs_dirent *sd;
> int is_dir;
> + int type;
>
> if (nd->flags & LOOKUP_RCU)
> return -ECHILD;
> @@ -314,6 +315,10 @@ static int sysfs_dentry_revalidate(struct dentry *dentry, struct nameidata *nd)
> sd = dentry->d_fsdata;
> mutex_lock(&sysfs_mutex);
>
> + type = sysfs_ns_type(sd);
> + if (sd->s_ns && (sysfs_info(dentry->d_sb)->ns[type] != sd->s_ns))
> + goto out_bad;
> +
> /* The sysfs dirent has been deleted */
> if (sd->s_flags & SYSFS_FLAG_REMOVED)
> goto out_bad;
> --
> 1.7.10.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply
* Re: [PATCH net-next] cnic: Fix mmap regression.
From: Michael Chan @ 2012-07-05 21:59 UTC (permalink / raw)
To: David Miller; +Cc: netdev
In-Reply-To: <20120629.153425.24594752441419170.davem@davemloft.net>
On Fri, 2012-06-29 at 15:34 -0700, David Miller wrote:
> From: "Michael Chan" <mchan@broadcom.com>
> Date: Fri, 29 Jun 2012 12:32:45 -0700
>
> > commit 1f85d58cdf15354a7120fc9ccc9bb9c45b53af88
> > cnic: Remove uio mem[0].
> >
> > introduced a regression as older versions of userspace app still rely
> > on this mmap. Restore the mmap functionality and get the base address
> > from pci_resource_start() as the nedev->base_addr has been deprecated for
> > PCI devices.
> >
> > Update version to 2.5.12.
> >
> > Signed-off-by: Michael Chan <mchan@broadocm.com>
>
> I really couldn't believe what you guys were doing in the original
> commit, but I decided to let you do stupid things and find out the
> hard way that removing any user visible interface is basically
> impossible.
>
> Applied, thanks.
>
David, this patch plus the earlier commit are also needed for the net
tree because netdev->base_addr was removed there. Can you apply these
directly to the net tree? Or you want me to send you the equivalent
patches for net. Thanks.
^ permalink raw reply
* [PATCH] gianfar: fix potential sk_wmem_alloc imbalance
From: Eric Dumazet @ 2012-07-05 21:45 UTC (permalink / raw)
To: David Miller
Cc: netdev, Manfred Rudigier, Claudiu Manoil, Jiajun Wu,
Paul Gortmaker, Andy Fleming
From: Eric Dumazet <edumazet@google.com>
commit db83d136d7f753 (gianfar: Fix missing sock reference when
processing TX time stamps) added a potential sk_wmem_alloc imbalance
If the new skb has a different truesize than old one, we can get a
negative sk_wmem_alloc once new skb is orphaned at TX completion.
Now we no longer early orphan skbs in dev_hard_start_xmit(), this
probably can lead to fatal bugs.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Manfred Rudigier <manfred.rudigier@omicron.at>
Cc: Claudiu Manoil <claudiu.manoil@freescale.com>
Cc: Jiajun Wu <b06378@freescale.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Andy Fleming <afleming@freescale.com>
---
Note : I don't have the hardware and discovered this problem by code
analysis. So please compile and run this patch before Acking it,
thanks !
BTW, dev->needed_headroom should be set to GMAC_FCB_LEN + GMAC_TXPAL_LEN
to avoid reallocations...
drivers/net/ethernet/freescale/gianfar.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index f2db8fc..ab1d80f 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -2063,10 +2063,9 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev)
return NETDEV_TX_OK;
}
- /* Steal sock reference for processing TX time stamps */
- swap(skb_new->sk, skb->sk);
- swap(skb_new->destructor, skb->destructor);
- kfree_skb(skb);
+ if (skb->sk)
+ skb_set_owner_w(skb_new, skb->sk);
+ consume_skb(skb);
skb = skb_new;
}
^ permalink raw reply related
* Re: [PATCH 0/5] rtcache remove respin
From: David Miller @ 2012-07-05 21:32 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev
In-Reply-To: <1341515017.3265.6.camel@edumazet-glaptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 05 Jul 2012 21:03:37 +0200
> If route cache is removed, I believe we can remove all paddings.
>
> Each tcp session will have its own dst_entry, instead of being shared.
Not really, the routing cache removal patches have poor performance
and won't go-in as-is. :-) Once PMTU/redirect/TCP-metrics are reworked
I plan to do things like the patch below to make the performance loss
more acceptable.
And then I'll do the same for input routes too, at which point your
'noref' case can be put back.
So really, we have to consider how to rework the layout of this
structure.
Thanks.
====================
ipv4: Cache output routes in fib_info nexthops.
Signed-off-by: David S. Miller <davem@davemloft.net>
---
include/net/ip_fib.h | 3 +++
net/ipv4/fib_semantics.c | 2 ++
net/ipv4/route.c | 9 +++++++++
3 files changed, 14 insertions(+)
diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 3dc7c96..ff9f0c4 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -45,6 +45,7 @@ struct fib_config {
};
struct fib_info;
+struct rtable;
struct fib_nh {
struct net_device *nh_dev;
@@ -63,6 +64,8 @@ struct fib_nh {
__be32 nh_gw;
__be32 nh_saddr;
int nh_saddr_genid;
+
+ struct rtable *rth;
};
/*
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index c46c20b..f3ada74 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -148,6 +148,8 @@ static void free_fib_info_rcu(struct rcu_head *head)
change_nexthops(fi) {
if (nexthop_nh->nh_dev)
dev_put(nexthop_nh->nh_dev);
+ if (nexthop_nh->rth)
+ dst_release(&nexthop_nh->rth->dst);
} endfor_nexthops(fi);
release_net(fi->fib_net);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 9f68f74..35bfd98 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -914,6 +914,8 @@ static void rt_set_nexthop(struct rtable *rt, const struct flowi4 *fl4,
#ifdef CONFIG_IP_ROUTE_CLASSID
dst->tclassid = FIB_RES_NH(*res).nh_tclassid;
#endif
+ FIB_RES_NH(*res).rth = rt;
+ dst_clone(&rt->dst);
}
if (dst_mtu(dst) > IP_MAX_MTU)
@@ -1399,6 +1401,13 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
fi = NULL;
}
+ if (fi) {
+ rth = FIB_RES_NH(*res).rth;
+ if (rth) {
+ dst_use(&rth->dst, jiffies);
+ return rth;
+ }
+ }
rth = rt_dst_alloc(dev_out,
IN_DEV_CONF_GET(in_dev, NOPOLICY),
IN_DEV_CONF_GET(in_dev, NOXFRM));
--
1.7.10
^ permalink raw reply related
* Re: ipv6 problem with 6lowpan
From: David Miller @ 2012-07-05 21:22 UTC (permalink / raw)
To: alex.bluesman.smirnov; +Cc: netdev
In-Reply-To: <CAJmB2rD8U1ihy4Ai6y5QGjj4f7txDabszesrNrQ=pgEbscePqQ@mail.gmail.com>
Should be fixed by Steffen Kassert's patch which I just pushed into net-next
^ permalink raw reply
* Re: [net-next:master] general protection fault in __nla_put()
From: David Miller @ 2012-07-05 21:22 UTC (permalink / raw)
To: wfg; +Cc: netdev
In-Reply-To: <20120705134857.GA14643@localhost>
Steffen Klassert posted a patch which fixes this.
^ permalink raw reply
* Re: [PATCH net-next] ipv6: Initialize the neighbour pointer of rt6_info on allocation
From: David Miller @ 2012-07-05 21:21 UTC (permalink / raw)
To: steffen.klassert; +Cc: netdev
In-Reply-To: <20120705131828.GE1869@secunet.com>
From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Thu, 5 Jul 2012 15:18:28 +0200
> git commit 97cac082 (ipv6: Store route neighbour in rt6_info struct)
> added a neighbour pointer to rt6_info. Currently we don't initialize
> this pointer at allocation time. We assume this pointer to be valid
> if it is not a null pointer, so initialize it on allocation.
>
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Applied, but as Eric said we need to find a way to avoid having to
make changes like this every time we simply want to add a struct
member to rt6_info.
^ permalink raw reply
* Re: AF_BUS socket address family
From: Jan Engelhardt @ 2012-07-05 21:06 UTC (permalink / raw)
To: Vincent Sanders; +Cc: David Miller, netdev, linux-kernel
In-Reply-To: <20120629231236.GA28593@mail.collabora.co.uk>
On Saturday 2012-06-30 01:12, Vincent Sanders wrote:
>
>Firstly it is intended is an interprocess mechanism and not to rely on
>a configured IP system, indeed one of its primary usages is to
>provide mechanism for various tools to set up IP networking.
Using IP as a localhost IPC is not uncommon (independent of
software preferring AF_UNIX, if so available). Distro boot
scripts have been running `ip addr add ::1/128 dev lo`
all these years along.
And now we suddently need a DBUS program just to configure
IP-based localhost IPC? I can see the flaw in that.
^ permalink raw reply
* Re: [net-next RFC V5 5/5] virtio_net: support negotiating the number of queues through ctrl vq
From: Amos Kong @ 2012-07-05 20:07 UTC (permalink / raw)
To: Sasha Levin
Cc: krkumar2, habanero, kvm, mst, netdev, mashirle, linux-kernel,
virtualization, edumazet, tahm, jwhan, davem, sri
In-Reply-To: <1341492679.18786.18.camel@lappy>
On 07/05/2012 08:51 PM, Sasha Levin wrote:
> On Thu, 2012-07-05 at 18:29 +0800, Jason Wang wrote:
>> @@ -1387,6 +1404,10 @@ static int virtnet_probe(struct virtio_device *vdev)
>> if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
>> vi->has_cvq = true;
>>
>> + /* Use single tx/rx queue pair as default */
>> + vi->num_queue_pairs = 1;
>> + vi->total_queue_pairs = num_queue_pairs;
vi->total_queue_pairs also should be set to 1
vi->total_queue_pairs = 1;
>
> The code is using this "default" even if the amount of queue pairs it
> wants was specified during initialization. This basically limits any
> device to use 1 pair when starting up.
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Amos.
^ permalink raw reply
* Re: [net-next RFC V5 4/5] virtio_net: multiqueue support
From: Amos Kong @ 2012-07-05 20:02 UTC (permalink / raw)
To: Jason Wang
Cc: krkumar2, habanero, mashirle, kvm, mst, netdev, linux-kernel,
virtualization, edumazet, tahm, jwhan, davem, sri
In-Reply-To: <1341484194-8108-5-git-send-email-jasowang@redhat.com>
On 07/05/2012 06:29 PM, Jason Wang wrote:
> This patch converts virtio_net to a multi queue device. After negotiated
> VIRTIO_NET_F_MULTIQUEUE feature, the virtio device has many tx/rx queue pairs,
> and driver could read the number from config space.
>
> The driver expects the number of rx/tx queue paris is equal to the number of
> vcpus. To maximize the performance under this per-cpu rx/tx queue pairs, some
> optimization were introduced:
>
> - Txq selection is based on the processor id in order to avoid contending a lock
> whose owner may exits to host.
> - Since the txq/txq were per-cpu, affinity hint were set to the cpu that owns
> the queue pairs.
>
> Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
...
>
> static int virtnet_probe(struct virtio_device *vdev)
> {
> - int err;
> + int i, err;
> struct net_device *dev;
> struct virtnet_info *vi;
> + u16 num_queues, num_queue_pairs;
> +
> + /* Find if host supports multiqueue virtio_net device */
> + err = virtio_config_val(vdev, VIRTIO_NET_F_MULTIQUEUE,
> + offsetof(struct virtio_net_config,
> + num_queues), &num_queues);
> +
> + /* We need atleast 2 queue's */
s/atleast/at least/
> + if (err || num_queues < 2)
> + num_queues = 2;
> + if (num_queues > MAX_QUEUES * 2)
> + num_queues = MAX_QUEUES;
num_queues = MAX_QUEUES * 2;
MAX_QUEUES is the limitation of RX or TX.
> +
> + num_queue_pairs = num_queues / 2;
...
--
Amos.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox