Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next 2/2] net/mlx4: Revert "mlx4: set maximal number of default RSS queues"
From: Ido Shamai @ 2014-01-15 12:49 UTC (permalink / raw)
  To: Sathya Perla, Yuval Mintz, Or Gerlitz, Or Gerlitz
  Cc: Amir Vadai, David S. Miller, netdev@vger.kernel.org,
	Eugenia Emantayev, Ido Shamay
In-Reply-To: <CF9D1877D81D214CB0CA0669EFAE020C26B83E12@CMEXMB1.ad.emulex.com>

On 1/15/2014 2:46 PM, Sathya Perla wrote:
>> -----Original Message-----
>> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf
>> Of Ido Shamai
>>
>> On 1/2/2014 12:27 PM, Yuval Mintz wrote:
>>>>>> Going back to your original commit 16917b87a "net-next: Add
>>>>>> netif_get_num_default_rss_queues" I am still not clear why we want
>>>>>>
>>>>>> 1. why we want a common default to all MQ devices?
>>>>> Although networking benefits from multiple Interrupt vectors
>>>>> (enabling more rings, better performance, etc.), bounding this
>>>>> number only to the number of cpus is unreasonable as it strains
>>>>> system resources; e.g., consider a 40-cpu server - we might wish
>>>>> to have 40 vectors per device, but that means that connecting
>>>>> several devices to the same server might cause other functions
>>>>> to fail probe as they will no longer be able to acquire interrupt
>>>>> vectors of their own.
>>>>
>>>> Modern servers which have tens of CPUs typically have thousands of MSI-X
>>>> vectors which means you should be easily able to plug four cards into a
>>>> server with 64 cores which will consume 256 out of the 1-4K vectors out
>>>> there. Anyway, let me continue your approach - how about raising the
>>>> default hard limit to 16 or having it as the number of cores @ the numa
>>>> node where the card is plugged?
>>>
>>> I think an additional issue was memory consumption -
>>> additional interrupts --> additional allocated memory (for Rx rings).
>>> And I do know the issues were real - we've had complains about devices
>>> failing to load due to lack of resources (not all servers in the world are
>>> top of the art).
>>>
>>> Anyway, I believe 8/16 are simply strict limitations without any true meaning;
>>> To judge what's more important, default `slimness' or default performance
>>> is beyond me.
>>> Perhaps the numa approach will prove beneficial (and will make some sense).
>>
>> After reviewing all that was said, I feel there is no need to enforce
>> vendors with this strict limitation without any true meaning.
>>
>> The reverted commit you applied forces the driver to use 8 rings at max
>> at all time, without the possibility to change in flight using ethtool,
>> as it's enforced on the PCI driver at module init (restarting the en
>> driver with different of requested rings will not affect).
>> So it's crucial for performance oriented applications using mlx4_en.
>
> The number of RSS/RX rings used by a driver can be increased (up to the HW supported value)
> at runtime using set-channels ethtool interface.
Not in this case, see my comment above: as it's enforced on the PCI 
driver at module init.
set-channels interface in our case will not change this limitation, but 
only up to it.

^ permalink raw reply

* RE: [PATCH net-next 2/2] net/mlx4: Revert "mlx4: set maximal number of default RSS queues"
From: Yuval Mintz @ 2014-01-15 12:54 UTC (permalink / raw)
  To: Ido Shamai, Or Gerlitz, Or Gerlitz
  Cc: Amir Vadai, David S. Miller, netdev@vger.kernel.org,
	Eugenia Emantayev, Ido Shamay
In-Reply-To: <52D67BFF.6070102@dev.mellanox.co.il>

> > Anyway, I believe 8/16 are simply strict limitations without any true
> meaning;
> > To judge what's more important, default `slimness' or default performance
> > is beyond me.
> > Perhaps the numa approach will prove beneficial (and will make some
> sense).
> 
> After reviewing all that was said, I feel there is no need to enforce
> vendors with this strict limitation without any true meaning.
> 
> The reverted commit you applied forces the driver to use 8 rings at max
> at all time, without the possibility to change in flight using ethtool,
> as it's enforced on the PCI driver at module init (restarting the en
> driver with different of requested rings will not affect).
> So it's crucial for performance oriented applications using mlx4_en.

The rational is to prevent default misusage of resources, be them irq vectors
memories for rings.
Notice this is just the default - You can always re-request interrupts if the
user explicitly requests a large number of rings;
Although, if the driver is incapable of that (e.g., hw limitations), perhaps you
should allocate a larger number of irq vectors during init and use a limitation
on your default number of rings 
(that's assuming that irq vectors are really inexpensive on all machines).

> Going through all Ethernet vendors I don't see this limitation enforced,
> so this limitation has no true meaning (no fairness).
> I think this patch should go in as is.
> Ethernet vendors should use it this limitation when they desire.

Might be true, but two wrongs don't make a right.
I believe that either this limitation should be mandatory, or the functionality
Should Not be included in the kernel as communal code and each driver
should do as it pleases.

^ permalink raw reply

* Re: Re: Route exceptions for IPv6 routes?
From: Hannes Frederic Sowa @ 2014-01-15 13:04 UTC (permalink / raw)
  To: Simon Schneider; +Cc: netdev
In-Reply-To: <trinity-6d5496d3-c58c-4b56-a0a8-777473467042-1389788725698@3capp-gmx-bs13>

Hi!

On Wed, Jan 15, 2014 at 01:25:25PM +0100, Simon Schneider wrote:
> thanks for the answer.
> 
> Sorry, I'm not so familiar with the implementation details.
> 
> What are nh-exceptions?

Nexthop exceptions, they store e.g. pmtu information in IPv4.

> I understand the current IPv6 implementation has some performance drawbacks. Correct?
> 
> I'm especially interested in 
> a) support for path MTUs

They are stored in the RTF_CACHED leafs of the routing table.

> b) IPv6 policy-based routing (can I expect the same results from "ip -6 rule" commands that I get from "ip rule" commands?)

Yes, you can. And they have nearly the same implementation overhead.

Additionally you can add routes which match a specific source and handle them
directly, this is a feature only in IPv6:

E.g. ip -6 route add .... from ... via ...

Greetings,

  Hannes

^ permalink raw reply

* Re: [PATCH 1/2 v2] ixgbe: define IXGBE_MAX_VFS_DRV_LIMIT macro and cleanup const 63
From: Ethan Zhao @ 2014-01-15 13:05 UTC (permalink / raw)
  To: Brown, Aaron F
  Cc: Kirsher, Jeffrey T, Brandeburg, Jesse, Allan, Bruce W,
	Wyborny, Carolyn, davem@davemloft.net,
	e1000-devel@lists.sourceforge.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <309B89C4C689E141A5FF6A0C5FB2118B7312CEE1@ORSMSX101.amr.corp.intel.com>

Aaron,
   I will check the patch and make it pass the building, seems encoding issue.

Thanks,
Ethan

On Wed, Jan 15, 2014 at 11:46 AM, Brown, Aaron F
<aaron.f.brown@intel.com> wrote:
> On Fri, 2013-12-27 at 01:02 -0800, Jeff Kirsher wrote:
>> On Wed, 2013-12-25 at 00:12 +0800, Ethan Zhao wrote:
>> > Because ixgbe driver limit the max number of VF functions could be
>> > enabled
>> > to 63, so define one macro IXGBE_MAX_VFS_DRV_LIMIT and cleanup the
>> > const 63
>> > in code.
>> >
>> > v2: fix a typo.
>> >
>> > Signed-off-by: Ethan Zhao <ethan.kernel@gmail.com>
>> > ---
>> >  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  | 4 ++--
>> >  drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 5 +++--
>> >  drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h | 5 +++++
>> >  3 files changed, 10 insertions(+), 4 deletions(-)
>>
>> Added to my queue, thanks Ethan!
>
> Hi Ethan,
>
> Did Jeff contact you about this failing to compile?  I'm currently
> providing vacation covering for him and we found this was failing to
> compile just before he left.  We captured the failure in our notes for
> this but there is no comment on if you were contacted or not.
>
> Regardless, when I apply this patch (with or without 2-2) we get the
> following error on a compilation attempt: Here's the error:
> --------------------------------------------------------
> Here's the error:
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c: In function
> "ixgbe_sw_init":
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:5033: error: stray "\357"
> in program
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:5033: error: stray "\274"
> in program
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:5033: error: stray "\215"
> in program
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:5033: error: expected ")"
> before numeric constant
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c: In function
> "ixgbe_probe":
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:7977: error: stray "\357"
> in program
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:7977: error: stray "\274"
> in program
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:7977: error: stray "\215"
> in program
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:7977: error: expected ")"
> before numeric constant
> make[5]: *** [drivers/net/ethernet/intel/ixgbe/ixgbe_main.o] Error 1
> make[5]: *** Waiting for unfinished jobs....
> make[4]: *** [drivers/net/ethernet/intel/ixgbe] Error 2
> make[4]: *** Waiting for unfinished jobs....
> make[3]: *** [drivers/net/ethernet/intel] Error 2
> make[2]: *** [drivers/net/ethernet] Error 2
> make[1]: *** [drivers/net] Error 2
> make: *** [drivers] Error 2
> --------------------------------------------------------
>
> Thanks,
> Aaron
>

^ permalink raw reply

* Re: [PATCH net-next 2/2] net/mlx4: Revert "mlx4: set maximal number of default RSS queues"
From: Ido Shamai @ 2014-01-15 13:15 UTC (permalink / raw)
  To: Yuval Mintz, Or Gerlitz, Or Gerlitz
  Cc: Amir Vadai, David S. Miller, netdev@vger.kernel.org,
	Eugenia Emantayev, Ido Shamay
In-Reply-To: <979A8436335E3744ADCD3A9F2A2B68A52AF2849F@SJEXCHMB10.corp.ad.broadcom.com>

On 1/15/2014 2:54 PM, Yuval Mintz wrote:
>>> Anyway, I believe 8/16 are simply strict limitations without any true
>> meaning;
>>> To judge what's more important, default `slimness' or default performance
>>> is beyond me.
>>> Perhaps the numa approach will prove beneficial (and will make some
>> sense).
>>
>> After reviewing all that was said, I feel there is no need to enforce
>> vendors with this strict limitation without any true meaning.
>>
>> The reverted commit you applied forces the driver to use 8 rings at max
>> at all time, without the possibility to change in flight using ethtool,
>> as it's enforced on the PCI driver at module init (restarting the en
>> driver with different of requested rings will not affect).
>> So it's crucial for performance oriented applications using mlx4_en.
>
> The rational is to prevent default misusage of resources, be them irq vectors
> memories for rings.
> Notice this is just the default - You can always re-request interrupts if the
> user explicitly requests a large number of rings;
> Although, if the driver is incapable of that (e.g., hw limitations), perhaps you
> should allocate a larger number of irq vectors during init and use a limitation
> on your default number of rings
> (that's assuming that irq vectors are really inexpensive on all machines).

I fully agree with you on that.
We will send a new patch that will limit the default number of rings to 
this limitation, and could be changed using set-channels ethtool 
interface. Number of IRQ vectors will be (max) of number of CPUs.
Yuval, thanks for clarifying.

>> Going through all Ethernet vendors I don't see this limitation enforced,
>> so this limitation has no true meaning (no fairness).
>> I think this patch should go in as is.
>> Ethernet vendors should use it this limitation when they desire.
>
> Might be true, but two wrongs don't make a right.
> I believe that either this limitation should be mandatory, or the functionality
> Should Not be included in the kernel as communal code and each driver
> should do as it pleases.
I agree, but this is a different discussion that should be held.
I agree, but this is a different discussion, which I hope to be held 
sometimes in the near future.

^ permalink raw reply

* Re: [PATCH 14/14] batman-adv: drop dependency against CRC16
From: Antonio Quartulli @ 2014-01-15 13:16 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n
In-Reply-To: <1389652298-472-15-git-send-email-antonio@meshcoding.com>

[-- Attachment #1: Type: text/plain, Size: 971 bytes --]

On 13/01/14 23:31, Antonio Quartulli wrote:
> The crc16 functionality is not used anymore, therefore
> we can safely remove the dependency in the Kbuild file.
> 
> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>

David,

we just realized that this dependency is still needed by our module.
(also thanks to Fengguang Wu build bot!)

Could you please revert this patch only?
Or do you want me to send the "revert" on the netdev ml as a patch (or
pull request)?


Thanks a lot!


> ---
>  net/batman-adv/Kconfig | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/net/batman-adv/Kconfig b/net/batman-adv/Kconfig
> index fa780b7..2b2dc47 100644
> --- a/net/batman-adv/Kconfig
> +++ b/net/batman-adv/Kconfig
> @@ -5,7 +5,6 @@
>  config BATMAN_ADV
>  	tristate "B.A.T.M.A.N. Advanced Meshing Protocol"
>  	depends on NET
> -	select CRC16
>  	select LIBCRC32C
>          default n
>  	help
> 


-- 
Antonio Quartulli




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH net-next] sctp: create helper function to enable|disable sackdelay
From: Neil Horman @ 2014-01-15 14:10 UTC (permalink / raw)
  To: Wang Weidong; +Cc: Vlad Yasevich, David Miller, linux-sctp, netdev
In-Reply-To: <52D653B1.4040708@huawei.com>

On Wed, Jan 15, 2014 at 05:24:01PM +0800, Wang Weidong wrote:
> add sctp_spp_sackdelay_{enable|disable} helper function for
> avoiding code duplication. 
> 
> Signed-off-by: Wang Weidong <wangweidong1@huawei.com>

Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply

* [PATCH 1/2 v3] ixgbe: define IXGBE_MAX_VFS_DRV_LIMIT macro and cleanup const 63
From: Ethan Zhao @ 2014-01-15 14:12 UTC (permalink / raw)
  To: aaron.f.brown, jeffrey.t.kirsher, jesse.brandeburg, bruce.w.allan,
	carolyn.wyborny, davem
  Cc: e1000-devel, netdev, linux-kernel, Ethan Zhao

Because ixgbe driver limit the max number of VF functions could be enabled
to 63, so define one macro IXGBE_MAX_VFS_DRV_LIMIT and cleanup the const 63
in code.

v2: fix a typo.
v3: fix a encoding issue.

Signed-off-by: Ethan Zhao <ethan.kernel@gmail.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  | 4 ++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 5 +++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h | 5 +++++
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 0ade0cd..47e9b44 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -4818,7 +4818,7 @@ static int ixgbe_sw_init(struct ixgbe_adapter *adapter)
 #ifdef CONFIG_PCI_IOV
 	/* assign number of SR-IOV VFs */
 	if (hw->mac.type != ixgbe_mac_82598EB)
-		adapter->num_vfs = (max_vfs > 63) ? 0 : max_vfs;
+		adapter->num_vfs = (max_vfs > IXGBE_MAX_VFS_DRV_LIMIT) ? 0 : max_vfs;
 
 #endif
 	/* enable itr by default in dynamic mode */
@@ -7640,7 +7640,7 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	ixgbe_init_mbx_params_pf(hw);
 	memcpy(&hw->mbx.ops, ii->mbx_ops, sizeof(hw->mbx.ops));
 	ixgbe_enable_sriov(adapter);
-	pci_sriov_set_totalvfs(pdev, 63);
+	pci_sriov_set_totalvfs(pdev, IXGBE_MAX_VFS_DRV_LIMIT);
 skip_sriov:
 
 #endif
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 276d7b1..6925979 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -152,7 +152,8 @@ void ixgbe_enable_sriov(struct ixgbe_adapter *adapter)
 		 * physical function.  If the user requests greater thn
 		 * 63 VFs then it is an error - reset to default of zero.
 		 */
-		adapter->num_vfs = min_t(unsigned int, adapter->num_vfs, 63);
+		adapter->num_vfs = min_t(unsigned int,
+					adapter->num_vfs, IXGBE_MAX_VFS_DRV_LIMIT);
 
 		err = pci_enable_sriov(adapter->pdev, adapter->num_vfs);
 		if (err) {
@@ -259,7 +260,7 @@ static int ixgbe_pci_sriov_enable(struct pci_dev *dev, int num_vfs)
 	 * PF.  The PCI bus driver already checks for other values out of
 	 * range.
 	 */
-	if (num_vfs > 63) {
+	if (num_vfs > IXGBE_MAX_VFS_DRV_LIMIT) {
 		err = -EPERM;
 		goto err_out;
 	}
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h
index 4713f9f..2593666 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h
@@ -28,6 +28,11 @@
 #ifndef _IXGBE_SRIOV_H_
 #define _IXGBE_SRIOV_H_
 
+/*  ixgbe driver limit the max number of VFs could be enabled to
+ *  63 (IXGBE_MAX_VF_FUNCTIONS - 1)
+ */
+#define IXGBE_MAX_VFS_DRV_LIMIT  (IXGBE_MAX_VF_FUNCTIONS - 1)
+
 void ixgbe_restore_vf_multicasts(struct ixgbe_adapter *adapter);
 void ixgbe_msg_task(struct ixgbe_adapter *adapter);
 int ixgbe_vf_configuration(struct pci_dev *pdev, unsigned int event_mask);
-- 
1.8.3.4 (Apple Git-47)

^ permalink raw reply related

* [PATCH 2/2] ixgbe: set driver_max_VFs should be done before enabling SRIOV
From: Ethan Zhao @ 2014-01-15 14:15 UTC (permalink / raw)
  To: aaron.f.brown, jeffrey.t.kirsher, jesse.brandeburg, bruce.w.allan,
	carolyn.wyborny, davem
  Cc: e1000-devel, netdev, Ethan Zhao, linux-kernel

commit 43dc4e01 Limit number of reported VFs to device specific value
It doesn't work and always returns -EBUSY because VFs ware already enabled.

ixgbe_enable_sriov()
        pci_enable_sriov()
                sriov_enable()
                {
                ... ..
                iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE;
                pci_cfg_access_lock(dev);
                ... ...
                }

pci_sriov_set_totalvfs()
{
... ...
if (dev->sriov->ctrl & PCI_SRIOV_CTRL_VFE)
                return -EBUSY;
...
}

So should set driver_max_VFs with pci_sriov_set_totalvfs() before
enable VFs with ixgbe_enable_sriov().

Signed-off-by: Ethan Zhao <ethan.kernel@gmail.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 47e9b44..bfc8e94 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -7639,8 +7639,8 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	/* Mailbox */
 	ixgbe_init_mbx_params_pf(hw);
 	memcpy(&hw->mbx.ops, ii->mbx_ops, sizeof(hw->mbx.ops));
-	ixgbe_enable_sriov(adapter);
 	pci_sriov_set_totalvfs(pdev, IXGBE_MAX_VFS_DRV_LIMIT);
+	ixgbe_enable_sriov(adapter);
 skip_sriov:
 
 #endif
-- 
1.8.3.4 (Apple Git-47)


------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply related

* Re: [Xen-devel] [PATCH net-next] xen-netfront: clean up code in xennet_release_rx_bufs
From: annie li @ 2014-01-15 14:15 UTC (permalink / raw)
  To: Andrew Bennieston; +Cc: Wei Liu, netdev, ian.campbell, xen-devel
In-Reply-To: <52D66ADF.9070401@citrix.com>


On 2014-1-15 19:02, Andrew Bennieston wrote:
> On 15/01/14 10:07, Wei Liu wrote:
>> On Fri, Jan 10, 2014 at 06:48:38AM +0800, Annie Li wrote:
>>> Current netfront only grants pages for grant copy, not for grant 
>>> transfer, so
>>> remove corresponding transfer code and add receiving copy code in
>>> xennet_release_rx_bufs.
>>>
>>
>> This path seldom gets call -- not that many people unload xen-netfront
>> driver. If Annie has tested this patch and it works as expected I think
>> it's fine.
>>
> In XenServer we have seen a number of cases where unplugging and 
> replugging VIFs results in leakage of grant references, eventually 
> leading to a case where you cannot plug a VIF (after ~ 400 such 
> cycles)...
>
> It's worth pointing out, as far as this patch is concerned, that 
> gnttab_end_foreign_access() can fail, 

Just like what Wei mentioned, it is gnttab_end_foreign_access_ref here, 
right?

> which is not taken into account here.

Good point, gnttab_end_foreign_access_ref fails for grant which is in use.

Thanks
Annie
>
> Andrew.
>
>> I'm not netfront maintainer but I'm happy to add
>> Acked-by: Wei Liu <wei.liu2@citrix.com>
>> if Annie confirms she's tested this patch.
>>
>> Wei.
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>>
>

^ permalink raw reply

* Re: [RFC] sysfs_rename_link() and its usage
From: Tejun Heo @ 2014-01-15 14:16 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Veaceslav Falico, Greg KH, linux-kernel, netdev
In-Reply-To: <87iotmc9ic.fsf@xmission.com>

Hey, Veaceslav, Eric.

On Tue, Jan 14, 2014 at 05:35:23PM -0800, Eric W. Biederman wrote:
> >>> >>This works like a charm. However, if I want to use (obviously, with the
> >>> >>symlink present):
> >>> >>
> >>> >>sysfs_rename_link(&(a->dev.kobj), &(b->dev.kobj), oldname, newname);
> >>> >
> >>> >You forgot the namespace option to this call, what kernel version are
> >>> >you using here?
> >>>
> >>> It's git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ,
> >>> 3.13-rc6 with some networking patches on top of it.

Does this work on 3.12?  How about Greg's driver-core-next?  Do you
have a minimal test case that I can use to reproduce the issue?

> sysfs_rename_link was originally written to infer figure everything out
> based on the target device.  Because symlinks only make sense being in
> the same namespace of their devices.  Tejun broke the inference and now
> you are having hard time using the code.

I wouldn't be surprised if I broke it but

> The commit that broke things was:
> 
> commit 4b30ee58ee64c64f59fd876e4afa6ed82caef3a4
> Author: Tejun Heo <tj@kernel.org>
> Date:   Wed Sep 11 22:29:06 2013 -0400
> 
>     sysfs: remove ktype->namespace() invocations in symlink code

I'm not so sure about the above commit.  The warning is from rename
failing to locate the existing node while the above patch updates the
target ns to rename to.  The old_ns part stays the same before and
after the commit.

Veaceslav, please confirm whether the issue is reproducible w/ v3.12.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [Xen-devel] [PATCH net-next] xen-netfront: clean up code in xennet_release_rx_bufs
From: annie li @ 2014-01-15 14:16 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, netdev, wei.liu2, ian.campbell
In-Reply-To: <52D66F11.204@citrix.com>


On 2014-1-15 19:20, David Vrabel wrote:
> On 09/01/14 22:48, Annie Li wrote:
>> Current netfront only grants pages for grant copy, not for grant transfer, so
>> remove corresponding transfer code and add receiving copy code in
>> xennet_release_rx_bufs.
> While netfront only supports a copying backend, I don't see anything
> preventing the backend from retaining mappings to netfront's Rx buffers...

Right. This does not prevent backend from mappings.
Maybe my description is not clear. What I mean here is based on old 
2.6.18 netfront which uses "copying_receiver" to tell netback whether rx 
requires grant copy. Probably changing "grant copy" above into "grant 
access" vs "grant transfer" is more precise. And the 
"gnttab_end_foreign_transfer_ref" is the unnecessary code kept from old 
netfront.

>
>> Signed-off-by: Annie Li <Annie.li@oracle.com>
>> ---
>>   drivers/net/xen-netfront.c |   60 ++-----------------------------------------
>>   1 files changed, 3 insertions(+), 57 deletions(-)
>>
>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>> index e59acb1..692589e 100644
>> --- a/drivers/net/xen-netfront.c
>> +++ b/drivers/net/xen-netfront.c
>> @@ -1134,78 +1134,24 @@ static void xennet_release_tx_bufs(struct netfront_info *np)
>>   
>>   static void xennet_release_rx_bufs(struct netfront_info *np)
>>   {
> [...]
>> -		mfn = gnttab_end_foreign_transfer_ref(ref);
>> +		gnttab_end_foreign_access_ref(ref, 0);
> ... the gnttab_end_foreign_access_ref() may then fail and...
>
>>   		gnttab_release_grant_reference(&np->gref_rx_head, ref);
>>   		np->grant_rx_ref[id] = GRANT_INVALID_REF;
> [...]
>> +		kfree_skb(skb);
> ... this could then potentially free pages that the backend still has
> mapped.  If the pages are then reused, this would leak information to
> the backend.

Yes, it is possible. But doing kfree_skb is right thing from netfront 
point of view.

>
> Since only a buggy backend would result in this, leaking the skbs and
> grant refs would be acceptable here.

This is the same thing for tx side which uses similar process.

>    I would also print an error.

You mean add some print log here? Is it necessary?

Thanks
Annie
>
> While checking blkfront for how it handles this, it also doesn't appear
> to do the right thing either.
>
> David
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net] bpf: do not use reciprocal divide
From: Eric Dumazet @ 2014-01-15 14:16 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Hannes Frederic Sowa, netdev, dborkman, darkjames-ws,
	Mircea Gherzan, Russell King, Matt Evans, Martin Schwidefsky
In-Reply-To: <20140115080007.GA6638@osiris>

On Wed, 2014-01-15 at 09:00 +0100, Heiko Carstens wrote:
> On Tue, Jan 14, 2014 at 11:02:41PM -0800, Eric Dumazet wrote:
> > diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
> > index 16871da37371..e349dc7d0992 100644
> > --- a/arch/s390/net/bpf_jit_comp.c
> > +++ b/arch/s390/net/bpf_jit_comp.c
> > @@ -371,11 +371,11 @@ static int bpf_jit_insn(struct bpf_jit *jit, struct sock_filter *filter,
> >  		/* dr %r4,%r12 */
> >  		EMIT2(0x1d4c);
> >  		break;
> > -	case BPF_S_ALU_DIV_K: /* A = reciprocal_divide(A, K) */
> > -		/* m %r4,<d(K)>(%r13) */
> > -		EMIT4_DISP(0x5c40d000, EMIT_CONST(K));
> > -		/* lr %r5,%r4 */
> > -		EMIT2(0x1854);
> > +	case BPF_S_ALU_DIV_K: /* A /= K */
> > +		/* lhi %r4,0 */
> > +		EMIT4(0xa7480000);
> > +		/* d %r4,<d(K)>(%r13) */
> > +		EMIT4_DISP(0x5d40d000, EMIT_CONST(K));
> >  		break;
> 
> The s390 part looks good.
> 
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 01b780856db2..ad30d626a5bd 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -166,7 +165,7 @@ unsigned int sk_run_filter(const struct sk_buff *skb,
> >  			A /= X;
> >  			continue;
> >  		case BPF_S_ALU_DIV_K:
> > -			A = reciprocal_divide(A, K);
> > +			A /= K;
> >  			continue;
> >  		case BPF_S_ALU_MOD_X:
> >  			if (X == 0)
> > @@ -553,11 +552,6 @@ int sk_chk_filter(struct sock_filter *filter, unsigned int flen)
> >  		/* Some instructions need special checks */
> >  		switch (code) {
> >  		case BPF_S_ALU_DIV_K:
> > -			/* check for division by zero */
> > -			if (ftest->k == 0)
> > -				return -EINVAL;
> > -			ftest->k = reciprocal_value(ftest->k);
> > -			break;
> 
> Are you sure you want to remove the k == 0 check? Is there something
> else that would prevent a division by zero?

This is done by factoring the two cases, modulo and divide :

 vi +553 net/core/filter.c 

                /* Some instructions need special checks */
                switch (code) {
                case BPF_S_ALU_DIV_K:
                case BPF_S_ALU_MOD_K:
                        /* check for division by zero */
                        if (ftest->k == 0)
                                return -EINVAL;
                        break;

^ permalink raw reply

* Re: [Xen-devel] [PATCH net-next] xen-netfront: clean up code in xennet_release_rx_bufs
From: annie li @ 2014-01-15 14:17 UTC (permalink / raw)
  To: David Vrabel; +Cc: Wei Liu, xen-devel, netdev, ian.campbell
In-Reply-To: <52D67677.4050407@citrix.com>


On 2014-1-15 19:52, David Vrabel wrote:
> On 15/01/14 11:42, Wei Liu wrote:
>> On Wed, Jan 15, 2014 at 11:20:49AM +0000, David Vrabel wrote:
>>> On 09/01/14 22:48, Annie Li wrote:
>>>> Current netfront only grants pages for grant copy, not for grant transfer, so
>>>> remove corresponding transfer code and add receiving copy code in
>>>> xennet_release_rx_bufs.
>>> While netfront only supports a copying backend, I don't see anything
>>> preventing the backend from retaining mappings to netfront's Rx buffers...
>>>
>> Correct.
>>
>>>> Signed-off-by: Annie Li <Annie.li@oracle.com>
>>>> ---
>>>>   drivers/net/xen-netfront.c |   60 ++-----------------------------------------
>>>>   1 files changed, 3 insertions(+), 57 deletions(-)
>>>>
>>>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>>>> index e59acb1..692589e 100644
>>>> --- a/drivers/net/xen-netfront.c
>>>> +++ b/drivers/net/xen-netfront.c
>>>> @@ -1134,78 +1134,24 @@ static void xennet_release_tx_bufs(struct netfront_info *np)
>>>>   
>>>>   static void xennet_release_rx_bufs(struct netfront_info *np)
>>>>   {
>>> [...]
>>>> -		mfn = gnttab_end_foreign_transfer_ref(ref);
>>>> +		gnttab_end_foreign_access_ref(ref, 0);
>>> ... the gnttab_end_foreign_access_ref() may then fail and...
>>>
>> Oh, I see. Andrew was actually referencing this function. Yes, it can
>> fail. Since he omitted "_ref" I looked at the other function when I
>> replied to him...
>>
>>>>   		gnttab_release_grant_reference(&np->gref_rx_head, ref);
>>>>   		np->grant_rx_ref[id] = GRANT_INVALID_REF;
>>> [...]
>>>> +		kfree_skb(skb);
>>> ... this could then potentially free pages that the backend still has
>>> mapped.  If the pages are then reused, this would leak information to
>>> the backend.
>>>
>>> Since only a buggy backend would result in this, leaking the skbs and
>>> grant refs would be acceptable here.  I would also print an error.
>>>
>> How about using gnttab_end_foreign_access. The deferred queue looks like
>> a right solution -- pending page won't get freed until gref is
>> quiescent.
> This is more like the correct approach but I don't think it still quite
> right.  The skb owns the pages so we don't want
> gnttab_end_foreign_access() to free them as freeing the skb will attempt
> to free them again.
>
> Having gnttab_end_foreign_access() do a free just looks odd to me, the
> free isn't paired with any alloc in the grant table code.
>
> It seems more logical to me that granting access takes an additional
> page ref, and then ending access releases that ref.

I am thinking of two ways, and they can be implemented in new patches.
1. If gnttab_end_foreign_access_ref succeeds, then kfree_skb is called 
to free skb. Otherwise, using gnttab_end_foreign_access to release ref 
and pages.
2. Add a similar deferred way of gnttab_end_foreign_access in 
gnttab_end_foreign_access_ref.

Thanks
Annie

^ permalink raw reply

* [RFC PATCH net-next 0/3] virtio_net: add aRFS support
From: Zhi Yong Wu @ 2014-01-15 14:20 UTC (permalink / raw)
  To: netdev; +Cc: therbert, edumazet, davem, Zhi Yong Wu

From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>

HI, folks

The patchset is trying to integrate aRFS support to virtio_net. In this case,
aRFS will be used to select the RX queue. To make sure that it's going ahead
in the correct direction, although it is still one RFC and isn't tested, it's
post out ASAP. Any comment are appreciated, thanks.

If anyone is interested in playing with it, you can get this patchset from my
dev git on github:
  git://github.com/wuzhy/kernel.git virtnet_rfs

Zhi Yong Wu (3):
  virtio_pci: Introduce one new config api vp_get_vq_irq()
  virtio_net: Introduce one dummy function virtnet_filter_rfs()
  virtio-net: Add accelerated RFS support

 drivers/net/virtio_net.c      |   67 ++++++++++++++++++++++++++++++++++++++++-
 drivers/virtio/virtio_pci.c   |   11 +++++++
 include/linux/virtio_config.h |   12 +++++++
 3 files changed, 89 insertions(+), 1 deletions(-)

-- 
1.7.6.5

^ permalink raw reply

* [RFC PATCH net-next 1/3] virtio_pci: Introduce one new config api vp_get_vq_irq()
From: Zhi Yong Wu @ 2014-01-15 14:20 UTC (permalink / raw)
  To: netdev; +Cc: therbert, edumazet, davem, Zhi Yong Wu
In-Reply-To: <1389795654-28381-1-git-send-email-zwu.kernel@gmail.com>

From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>

This config api is used to get irq number of the given virtqueue.

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 drivers/virtio/virtio_pci.c   |   11 +++++++++++
 include/linux/virtio_config.h |   12 ++++++++++++
 2 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index a37c699..85f1aa6 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -654,6 +654,16 @@ static int vp_set_vq_affinity(struct virtqueue *vq, int cpu)
 	return 0;
 }
 
+static int vp_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	struct virtio_pci_vq_info *info = vq->priv;
+
+	if (vp_dev->msix_enabled)
+		return vp_dev->msix_entries[info->msix_vector].vector;
+	return -1;
+}
+
 static const struct virtio_config_ops virtio_pci_config_ops = {
 	.get		= vp_get,
 	.set		= vp_set,
@@ -666,6 +676,7 @@ static const struct virtio_config_ops virtio_pci_config_ops = {
 	.finalize_features = vp_finalize_features,
 	.bus_name	= vp_bus_name,
 	.set_vq_affinity = vp_set_vq_affinity,
+	.get_vq_irq	= vp_get_vq_irq,
 };
 
 static void virtio_pci_release_dev(struct device *_d)
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index e8f8f71..b70fc47 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -51,6 +51,9 @@
  *      This returns a pointer to the bus name a la pci_name from which
  *      the caller can then copy.
  * @set_vq_affinity: set the affinity for a virtqueue.
+ * @get_vq_irq: get irq of the given virtqueue
+ *	vdev: the virtio_device
+ *	vq: the virtqueue
  */
 typedef void vq_callback_t(struct virtqueue *);
 struct virtio_config_ops {
@@ -70,6 +73,7 @@ struct virtio_config_ops {
 	void (*finalize_features)(struct virtio_device *vdev);
 	const char *(*bus_name)(struct virtio_device *vdev);
 	int (*set_vq_affinity)(struct virtqueue *vq, int cpu);
+	int (*get_vq_irq)(struct virtio_device *vdev, struct virtqueue *vq);
 };
 
 /* If driver didn't advertise the feature, it will never appear. */
@@ -135,6 +139,14 @@ int virtqueue_set_affinity(struct virtqueue *vq, int cpu)
 	return 0;
 }
 
+static inline
+int virtqueue_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq)
+{
+	if (vdev->config->get_vq_irq)
+		return vdev->config->get_vq_irq(vdev, vq);
+	return -1;
+}
+
 /* Config space accessors. */
 #define virtio_cread(vdev, structname, member, ptr)			\
 	do {								\
-- 
1.7.6.5

^ permalink raw reply related

* [RFC PATCH net-next 2/3] virtio_net: Introduce one dummy function virtnet_filter_rfs()
From: Zhi Yong Wu @ 2014-01-15 14:20 UTC (permalink / raw)
  To: netdev; +Cc: therbert, edumazet, davem, Zhi Yong Wu
In-Reply-To: <1389795654-28381-1-git-send-email-zwu.kernel@gmail.com>

From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 drivers/net/virtio_net.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 7b17240..046421c 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1295,6 +1295,14 @@ static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
 	return 0;
 }
 
+#ifdef CONFIG_RFS_ACCEL
+static int virtnet_filter_rfs(struct net_device *net_dev,
+		const struct sk_buff *skb, u16 rxq_index, u32 flow_id)
+{
+	return 0;
+}
+#endif /* CONFIG_RFS_ACCEL */
+
 static const struct net_device_ops virtnet_netdev = {
 	.ndo_open            = virtnet_open,
 	.ndo_stop   	     = virtnet_close,
@@ -1309,6 +1317,9 @@ static const struct net_device_ops virtnet_netdev = {
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller = virtnet_netpoll,
 #endif
+#ifdef CONFIG_RFS_ACCEL
+	.ndo_rx_flow_steer   = virtnet_filter_rfs,
+#endif
 };
 
 static void virtnet_config_changed_work(struct work_struct *work)
-- 
1.7.6.5

^ permalink raw reply related

* [RFC PATCH net-next 3/3] virtio-net: Add accelerated RFS support
From: Zhi Yong Wu @ 2014-01-15 14:20 UTC (permalink / raw)
  To: netdev; +Cc: therbert, edumazet, davem, Zhi Yong Wu
In-Reply-To: <1389795654-28381-1-git-send-email-zwu.kernel@gmail.com>

From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>

Accelerated RFS is used to select the RX queue.

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 drivers/net/virtio_net.c |   56 +++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 55 insertions(+), 1 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 046421c..9f4cfa7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -26,6 +26,7 @@
 #include <linux/if_vlan.h>
 #include <linux/slab.h>
 #include <linux/cpu.h>
+#include <linux/cpu_rmap.h>
 
 static int napi_weight = NAPI_POLL_WEIGHT;
 module_param(napi_weight, int, 0444);
@@ -1554,6 +1555,49 @@ err:
 	return ret;
 }
 
+static void virtnet_free_irq_cpu_rmap(struct virtnet_info *vi)
+{
+#ifdef CONFIG_RFS_ACCEL
+	free_irq_cpu_rmap(vi->dev->rx_cpu_rmap);
+	vi->dev->rx_cpu_rmap = NULL;
+#endif
+}
+
+static int virtnet_init_rx_cpu_rmap(struct virtnet_info *vi)
+{
+	int rc = 0;
+
+#ifdef CONFIG_RFS_ACCEL
+	struct virtio_device *vdev = vi->vdev;
+	unsigned int irq;
+	int i;
+
+	if (!vi->affinity_hint_set)
+		goto out;
+
+	vi->dev->rx_cpu_rmap = alloc_irq_cpu_rmap(vi->max_queue_pairs);
+	if (!vi->dev->rx_cpu_rmap) {
+		rc = -ENOMEM;
+		goto out;
+	}
+
+	for (i = 0; i < vi->max_queue_pairs; i++) {
+		irq = virtqueue_get_vq_irq(vdev, vi->rq[i].vq);
+		if (irq == -1)
+			goto failed;
+
+		rc = irq_cpu_rmap_add(vi->dev->rx_cpu_rmap, irq);
+		if (rc) {
+failed:
+			virtnet_free_irq_cpu_rmap(vi);
+			goto out;
+		}
+	}
+out:
+#endif
+	return rc;
+}
+
 static int virtnet_probe(struct virtio_device *vdev)
 {
 	int i, err;
@@ -1671,10 +1715,16 @@ static int virtnet_probe(struct virtio_device *vdev)
 	netif_set_real_num_tx_queues(dev, vi->curr_queue_pairs);
 	netif_set_real_num_rx_queues(dev, vi->curr_queue_pairs);
 
+	err = virtnet_init_rx_cpu_rmap(vi);
+	if (err) {
+		pr_debug("virtio_net: initializing rx cpu rmap failed\n");
+		goto free_vqs;
+	}
+
 	err = register_netdev(dev);
 	if (err) {
 		pr_debug("virtio_net: registering device failed\n");
-		goto free_vqs;
+		goto free_cpu_rmap;
 	}
 
 	/* Last of all, set up some receive buffers. */
@@ -1714,6 +1764,8 @@ static int virtnet_probe(struct virtio_device *vdev)
 free_recv_bufs:
 	free_receive_bufs(vi);
 	unregister_netdev(dev);
+free_cpu_rmap:
+	virtnet_free_irq_cpu_rmap(vi);
 free_vqs:
 	cancel_delayed_work_sync(&vi->refill);
 	virtnet_del_vqs(vi);
@@ -1751,6 +1803,8 @@ static void virtnet_remove(struct virtio_device *vdev)
 
 	unregister_netdev(vi->dev);
 
+	virtnet_free_irq_cpu_rmap(vi);
+
 	remove_vq_common(vi);
 	if (vi->alloc_frag.page)
 		put_page(vi->alloc_frag.page);
-- 
1.7.6.5

^ permalink raw reply related

* Re: [PATCH net] bpf: do not use reciprocal divide
From: Eric Dumazet @ 2014-01-15 14:21 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Martin Schwidefsky, Hannes Frederic Sowa, netdev, dborkman,
	darkjames-ws, Mircea Gherzan, Russell King, Matt Evans
In-Reply-To: <20140115105114.GB6638@osiris>

On Wed, 2014-01-15 at 11:51 +0100, Heiko Carstens wrote:
> On Wed, Jan 15, 2014 at 09:13:22AM +0100, Martin Schwidefsky wrote:
> > On Wed, 15 Jan 2014 09:00:07 +0100
> > Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > 
> > > On Tue, Jan 14, 2014 at 11:02:41PM -0800, Eric Dumazet wrote:
> > > > diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
> > > > index 16871da37371..e349dc7d0992 100644
> > > > --- a/arch/s390/net/bpf_jit_comp.c
> > > > +++ b/arch/s390/net/bpf_jit_comp.c
> > > > @@ -371,11 +371,11 @@ static int bpf_jit_insn(struct bpf_jit *jit, struct sock_filter *filter,
> > > >  		/* dr %r4,%r12 */
> > > >  		EMIT2(0x1d4c);
> > > >  		break;
> > > > -	case BPF_S_ALU_DIV_K: /* A = reciprocal_divide(A, K) */
> > > > -		/* m %r4,<d(K)>(%r13) */
> > > > -		EMIT4_DISP(0x5c40d000, EMIT_CONST(K));
> > > > -		/* lr %r5,%r4 */
> > > > -		EMIT2(0x1854);
> > > > +	case BPF_S_ALU_DIV_K: /* A /= K */
> > > > +		/* lhi %r4,0 */
> > > > +		EMIT4(0xa7480000);
> > > > +		/* d %r4,<d(K)>(%r13) */
> > > > +		EMIT4_DISP(0x5d40d000, EMIT_CONST(K));
> > > >  		break;
> > > 
> > > The s390 part looks good.
> > 
> > Does it? The divide instruction is signed, for the special
> > case of K==1 this can now cause an exception if the quotient
> > gets too large. We should add a check for K==1 and do nothing
> > in this case. With a divisor of at least 2 the result will
> > stay in the limit.
> 
> Indeed. That's quite subtle.

net/core/filter.c does :

A /= K;

Why is this working in generic code (if K == 1), not in s390 one ?

^ permalink raw reply

* Re: [PATCH net] bpf: do not use reciprocal divide
From: Eric Dumazet @ 2014-01-15 14:25 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Martin Schwidefsky, Hannes Frederic Sowa, netdev, dborkman,
	darkjames-ws, Mircea Gherzan, Russell King, Matt Evans
In-Reply-To: <1389795716.31367.333.camel@edumazet-glaptop2.roam.corp.google.com>

On Wed, 2014-01-15 at 06:21 -0800, Eric Dumazet wrote:
> On Wed, 2014-01-15 at 11:51 +0100, Heiko Carstens wrote:
> > On Wed, Jan 15, 2014 at 09:13:22AM +0100, Martin Schwidefsky wrote:
> > > On Wed, 15 Jan 2014 09:00:07 +0100
> > > Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > > 
> > > > On Tue, Jan 14, 2014 at 11:02:41PM -0800, Eric Dumazet wrote:
> > > > > diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
> > > > > index 16871da37371..e349dc7d0992 100644
> > > > > --- a/arch/s390/net/bpf_jit_comp.c
> > > > > +++ b/arch/s390/net/bpf_jit_comp.c
> > > > > @@ -371,11 +371,11 @@ static int bpf_jit_insn(struct bpf_jit *jit, struct sock_filter *filter,
> > > > >  		/* dr %r4,%r12 */
> > > > >  		EMIT2(0x1d4c);
> > > > >  		break;
> > > > > -	case BPF_S_ALU_DIV_K: /* A = reciprocal_divide(A, K) */
> > > > > -		/* m %r4,<d(K)>(%r13) */
> > > > > -		EMIT4_DISP(0x5c40d000, EMIT_CONST(K));
> > > > > -		/* lr %r5,%r4 */
> > > > > -		EMIT2(0x1854);
> > > > > +	case BPF_S_ALU_DIV_K: /* A /= K */
> > > > > +		/* lhi %r4,0 */
> > > > > +		EMIT4(0xa7480000);
> > > > > +		/* d %r4,<d(K)>(%r13) */
> > > > > +		EMIT4_DISP(0x5d40d000, EMIT_CONST(K));
> > > > >  		break;
> > > > 
> > > > The s390 part looks good.
> > > 
> > > Does it? The divide instruction is signed, for the special
> > > case of K==1 this can now cause an exception if the quotient
> > > gets too large. We should add a check for K==1 and do nothing
> > > in this case. With a divisor of at least 2 the result will
> > > stay in the limit.
> > 
> > Indeed. That's quite subtle.
> 
> net/core/filter.c does :
> 
> A /= K;
> 
> Why is this working in generic code (if K == 1), not in s390 one ?

Note that I copied code found in BPF_S_ALU_MOD_K, so this one would need
a fix as well.

^ permalink raw reply

* [PATCH net-next] net: stmmac: fix NULL pointer dereference in stmmac_get_tx_hwtstamp
From: Bruce Liu @ 2014-01-15 14:26 UTC (permalink / raw)
  To: peppe.cavallaro; +Cc: netdev, linux-kernel, Bruce Liu

When timestamping is enabled, stmmac_tx_clean will call
stmmac_get_tx_hwtstamp to get tx TS.
But the skb can be NULL because the last of its tx_skbuff is NULL
if this packet frame is filled in more than one descriptors.

Signed-off-by: Bruce Liu <damuzi000@gmail.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 797b56a..47f2287 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -332,7 +332,7 @@ static void stmmac_get_tx_hwtstamp(struct stmmac_priv *priv,
 		return;
 
 	/* exit if skb doesn't support hw tstamp */
-	if (likely(!(skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS)))
+	if (likely(!skb || !(skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS)))
 		return;
 
 	if (priv->adv_ts)
-- 
1.7.9.5

^ permalink raw reply related

* Re: [Xen-devel] [PATCH net-next] xen-netfront: clean up code in xennet_release_rx_bufs
From: Wei Liu @ 2014-01-15 14:32 UTC (permalink / raw)
  To: annie li; +Cc: David Vrabel, Wei Liu, xen-devel, netdev, ian.campbell
In-Reply-To: <52D69864.9030207@oracle.com>

On Wed, Jan 15, 2014 at 10:17:08PM +0800, annie li wrote:
[...]
> >Having gnttab_end_foreign_access() do a free just looks odd to me, the
> >free isn't paired with any alloc in the grant table code.
> >
> >It seems more logical to me that granting access takes an additional
> >page ref, and then ending access releases that ref.
> 
> I am thinking of two ways, and they can be implemented in new patches.
> 1. If gnttab_end_foreign_access_ref succeeds, then kfree_skb is
> called to free skb. Otherwise, using gnttab_end_foreign_access to
> release ref and pages.

This is probably not a very good idea as skb_free does a lot more things
than simply freeing pages. You're still leaking something (skb: state,
head etc.) with this approach.

> 2. Add a similar deferred way of gnttab_end_foreign_access in
> gnttab_end_foreign_access_ref.
> 
> Thanks
> Annie
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v5 1/2] ipv6 addrconf: add IFA_F_NOPREFIXROUTE flag to suppress creation of IP6 routes
From: Thomas Haller @ 2014-01-15 14:36 UTC (permalink / raw)
  To: David Miller; +Cc: hannes, jiri, netdev, stephen, dcbw, Thomas Haller
In-Reply-To: <1389796619-13440-1-git-send-email-thaller@redhat.com>

When adding/modifying an IPv6 address, the userspace application needs
a way to suppress adding a prefix route. This is for example relevant
together with IFA_F_MANAGERTEMPADDR, where userspace creates autoconf
generated addresses, but depending on on-link, no route for the
prefix should be added.

Signed-off-by: Thomas Haller <thaller@redhat.com>
---
 include/uapi/linux/if_addr.h |  1 +
 net/ipv6/addrconf.c          | 19 +++++++++++++------
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/if_addr.h b/include/uapi/linux/if_addr.h
index cfed10b..dea10a8 100644
--- a/include/uapi/linux/if_addr.h
+++ b/include/uapi/linux/if_addr.h
@@ -49,6 +49,7 @@ enum {
 #define IFA_F_TENTATIVE		0x40
 #define IFA_F_PERMANENT		0x80
 #define IFA_F_MANAGETEMPADDR	0x100
+#define IFA_F_NOPREFIXROUTE	0x200
 
 struct ifa_cacheinfo {
 	__u32	ifa_prefered;
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 1b2e4ee..85100c1 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2434,8 +2434,11 @@ static int inet6_addr_add(struct net *net, int ifindex,
 			    valid_lft, prefered_lft);
 
 	if (!IS_ERR(ifp)) {
-		addrconf_prefix_route(&ifp->addr, ifp->prefix_len, dev,
-				      expires, flags);
+		if (!(ifa_flags & IFA_F_NOPREFIXROUTE)) {
+			addrconf_prefix_route(&ifp->addr, ifp->prefix_len, dev,
+					      expires, flags);
+		}
+
 		/*
 		 * Note that section 3.1 of RFC 4429 indicates
 		 * that the Optimistic flag should not be set for
@@ -3662,7 +3665,8 @@ static int inet6_addr_modify(struct inet6_ifaddr *ifp, u32 ifa_flags,
 	spin_lock_bh(&ifp->lock);
 	was_managetempaddr = ifp->flags & IFA_F_MANAGETEMPADDR;
 	ifp->flags &= ~(IFA_F_DEPRECATED | IFA_F_PERMANENT | IFA_F_NODAD |
-			IFA_F_HOMEADDRESS | IFA_F_MANAGETEMPADDR);
+			IFA_F_HOMEADDRESS | IFA_F_MANAGETEMPADDR |
+			IFA_F_NOPREFIXROUTE);
 	ifp->flags |= ifa_flags;
 	ifp->tstamp = jiffies;
 	ifp->valid_lft = valid_lft;
@@ -3672,8 +3676,10 @@ static int inet6_addr_modify(struct inet6_ifaddr *ifp, u32 ifa_flags,
 	if (!(ifp->flags&IFA_F_TENTATIVE))
 		ipv6_ifa_notify(0, ifp);
 
-	addrconf_prefix_route(&ifp->addr, ifp->prefix_len, ifp->idev->dev,
-			      expires, flags);
+	if (!(ifa_flags & IFA_F_NOPREFIXROUTE)) {
+		addrconf_prefix_route(&ifp->addr, ifp->prefix_len, ifp->idev->dev,
+				      expires, flags);
+	}
 
 	if (was_managetempaddr || ifp->flags & IFA_F_MANAGETEMPADDR) {
 		if (was_managetempaddr && !(ifp->flags & IFA_F_MANAGETEMPADDR))
@@ -3727,7 +3733,8 @@ inet6_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh)
 	ifa_flags = tb[IFA_FLAGS] ? nla_get_u32(tb[IFA_FLAGS]) : ifm->ifa_flags;
 
 	/* We ignore other flags so far. */
-	ifa_flags &= IFA_F_NODAD | IFA_F_HOMEADDRESS | IFA_F_MANAGETEMPADDR;
+	ifa_flags &= IFA_F_NODAD | IFA_F_HOMEADDRESS | IFA_F_MANAGETEMPADDR |
+		     IFA_F_NOPREFIXROUTE;
 
 	ifa = ipv6_get_ifaddr(net, pfx, dev, 1);
 	if (ifa == NULL) {
-- 
1.8.4.2

^ permalink raw reply related

* [PATCH v5 0/2] ipv6 addrconf: add IFA_F_NOPREFIXROUTE flag to suppress creation of IP6 routes
From: Thomas Haller @ 2014-01-15 14:36 UTC (permalink / raw)
  To: David Miller; +Cc: hannes, jiri, netdev, stephen, dcbw, Thomas Haller
In-Reply-To: <20140110.181030.2081631538801450145.davem@davemloft.net>

v1 -> v2: add a second commit, handling NOPREFIXROUTE in ip6_del_addr.
v2 -> v3: reword commit messages, code comments and some refactoring.
v3 -> v4: refactor, rename variables, add enum
v4 -> v5: rebase, so that patch applies cleanly to current net-next/master

Thomas Haller (2):
  ipv6 addrconf: add IFA_F_NOPREFIXROUTE flag to suppress creation of
    IP6 routes
  ipv6 addrconf: don't cleanup prefix route for IFA_F_NOPREFIXROUTE

 include/uapi/linux/if_addr.h |   1 +
 net/ipv6/addrconf.c          | 203 ++++++++++++++++++++++++++-----------------
 2 files changed, 123 insertions(+), 81 deletions(-)

-- 
1.8.4.2

^ permalink raw reply

* [PATCH v5 2/2] ipv6 addrconf: don't cleanup prefix route for IFA_F_NOPREFIXROUTE
From: Thomas Haller @ 2014-01-15 14:36 UTC (permalink / raw)
  To: David Miller; +Cc: hannes, jiri, netdev, stephen, dcbw, Thomas Haller
In-Reply-To: <1389796619-13440-1-git-send-email-thaller@redhat.com>

Refactor the deletion/update of prefix routes when removing an
address. Now also consider IFA_F_NOPREFIXROUTE and if there is an address
present with this flag, to not cleanup the route. Instead, assume
that userspace is taking care of this route.

Also perform the same cleanup, when userspace changes an existing address
to add NOPREFIXROUTE (to an address that didn't have this flag). This is
done because when the address was added, a prefix route was created for it.
Since the user now wants to handle this route by himself, we cleanup this
route.

This cleanup of the route is not totally robust. There is no guarantee,
that the route we are about to delete was really the one added by the
kernel. This behavior does not change by the patch, and in practice it
should work just fine.

Signed-off-by: Thomas Haller <thaller@redhat.com>
---
 net/ipv6/addrconf.c | 184 +++++++++++++++++++++++++++++++---------------------
 1 file changed, 109 insertions(+), 75 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 85100c1..6913a82 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -900,15 +900,95 @@ out:
 	goto out2;
 }
 
+enum cleanup_prefix_rt_t {
+	CLEANUP_PREFIX_RT_NOP,    /* no cleanup action for prefix route */
+	CLEANUP_PREFIX_RT_DEL,    /* delete the prefix route */
+	CLEANUP_PREFIX_RT_EXPIRE, /* update the lifetime of the prefix route */
+};
+
+/*
+ * Check, whether the prefix for ifp would still need a prefix route
+ * after deleting ifp. The function returns one of the CLEANUP_PREFIX_RT_*
+ * constants.
+ *
+ * 1) we don't purge prefix if address was not permanent.
+ *    prefix is managed by its own lifetime.
+ * 2) we also don't purge, if the address was IFA_F_NOPREFIXROUTE.
+ * 3) if there are no addresses, delete prefix.
+ * 4) if there are still other permanent address(es),
+ *    corresponding prefix is still permanent.
+ * 5) if there are still other addresses with IFA_F_NOPREFIXROUTE,
+ *    don't purge the prefix, assume user space is managing it.
+ * 6) otherwise, update prefix lifetime to the
+ *    longest valid lifetime among the corresponding
+ *    addresses on the device.
+ *    Note: subsequent RA will update lifetime.
+ **/
+static enum cleanup_prefix_rt_t
+check_cleanup_prefix_route(struct inet6_ifaddr *ifp, unsigned long *expires)
+{
+	struct inet6_ifaddr *ifa;
+	struct inet6_dev *idev = ifp->idev;
+	unsigned long lifetime;
+	enum cleanup_prefix_rt_t action = CLEANUP_PREFIX_RT_DEL;
+
+	*expires = jiffies;
+
+	list_for_each_entry(ifa, &idev->addr_list, if_list) {
+		if (ifa == ifp)
+			continue;
+		if (!ipv6_prefix_equal(&ifa->addr, &ifp->addr,
+				       ifp->prefix_len))
+			continue;
+		if (ifa->flags & (IFA_F_PERMANENT | IFA_F_NOPREFIXROUTE))
+			return CLEANUP_PREFIX_RT_NOP;
+
+		action = CLEANUP_PREFIX_RT_EXPIRE;
+
+		spin_lock(&ifa->lock);
+
+		lifetime = addrconf_timeout_fixup(ifa->valid_lft, HZ);
+		/*
+		 * Note: Because this address is
+		 * not permanent, lifetime <
+		 * LONG_MAX / HZ here.
+		 */
+		if (time_before(*expires, ifa->tstamp + lifetime * HZ))
+			*expires = ifa->tstamp + lifetime * HZ;
+		spin_unlock(&ifa->lock);
+	}
+
+	return action;
+}
+
+static void
+cleanup_prefix_route(struct inet6_ifaddr *ifp, unsigned long expires, bool del_rt)
+{
+	struct rt6_info *rt;
+
+	rt = addrconf_get_prefix_route(&ifp->addr,
+				       ifp->prefix_len,
+				       ifp->idev->dev,
+				       0, RTF_GATEWAY | RTF_DEFAULT);
+	if (rt) {
+		if (del_rt)
+			ip6_del_rt(rt);
+		else {
+			if (!(rt->rt6i_flags & RTF_EXPIRES))
+				rt6_set_expires(rt, expires);
+			ip6_rt_put(rt);
+		}
+	}
+}
+
+
 /* This function wants to get referenced ifp and releases it before return */
 
 static void ipv6_del_addr(struct inet6_ifaddr *ifp)
 {
-	struct inet6_ifaddr *ifa, *ifn;
-	struct inet6_dev *idev = ifp->idev;
 	int state;
-	int deleted = 0, onlink = 0;
-	unsigned long expires = jiffies;
+	enum cleanup_prefix_rt_t action = CLEANUP_PREFIX_RT_NOP;
+	unsigned long expires;
 
 	spin_lock_bh(&ifp->state_lock);
 	state = ifp->state;
@@ -922,7 +1002,7 @@ static void ipv6_del_addr(struct inet6_ifaddr *ifp)
 	hlist_del_init_rcu(&ifp->addr_lst);
 	spin_unlock_bh(&addrconf_hash_lock);
 
-	write_lock_bh(&idev->lock);
+	write_lock_bh(&ifp->idev->lock);
 
 	if (ifp->flags&IFA_F_TEMPORARY) {
 		list_del(&ifp->tmp_list);
@@ -933,45 +1013,13 @@ static void ipv6_del_addr(struct inet6_ifaddr *ifp)
 		__in6_ifa_put(ifp);
 	}
 
-	list_for_each_entry_safe(ifa, ifn, &idev->addr_list, if_list) {
-		if (ifa == ifp) {
-			list_del_init(&ifp->if_list);
-			__in6_ifa_put(ifp);
+	if (ifp->flags & IFA_F_PERMANENT && !(ifp->flags & IFA_F_NOPREFIXROUTE))
+		action = check_cleanup_prefix_route(ifp, &expires);
 
-			if (!(ifp->flags & IFA_F_PERMANENT) || onlink > 0)
-				break;
-			deleted = 1;
-			continue;
-		} else if (ifp->flags & IFA_F_PERMANENT) {
-			if (ipv6_prefix_equal(&ifa->addr, &ifp->addr,
-					      ifp->prefix_len)) {
-				if (ifa->flags & IFA_F_PERMANENT) {
-					onlink = 1;
-					if (deleted)
-						break;
-				} else {
-					unsigned long lifetime;
-
-					if (!onlink)
-						onlink = -1;
-
-					spin_lock(&ifa->lock);
-
-					lifetime = addrconf_timeout_fixup(ifa->valid_lft, HZ);
-					/*
-					 * Note: Because this address is
-					 * not permanent, lifetime <
-					 * LONG_MAX / HZ here.
-					 */
-					if (time_before(expires,
-							ifa->tstamp + lifetime * HZ))
-						expires = ifa->tstamp + lifetime * HZ;
-					spin_unlock(&ifa->lock);
-				}
-			}
-		}
-	}
-	write_unlock_bh(&idev->lock);
+	list_del_init(&ifp->if_list);
+	__in6_ifa_put(ifp);
+
+	write_unlock_bh(&ifp->idev->lock);
 
 	addrconf_del_dad_timer(ifp);
 
@@ -979,38 +1027,9 @@ static void ipv6_del_addr(struct inet6_ifaddr *ifp)
 
 	inet6addr_notifier_call_chain(NETDEV_DOWN, ifp);
 
-	/*
-	 * Purge or update corresponding prefix
-	 *
-	 * 1) we don't purge prefix here if address was not permanent.
-	 *    prefix is managed by its own lifetime.
-	 * 2) if there are no addresses, delete prefix.
-	 * 3) if there are still other permanent address(es),
-	 *    corresponding prefix is still permanent.
-	 * 4) otherwise, update prefix lifetime to the
-	 *    longest valid lifetime among the corresponding
-	 *    addresses on the device.
-	 *    Note: subsequent RA will update lifetime.
-	 *
-	 * --yoshfuji
-	 */
-	if ((ifp->flags & IFA_F_PERMANENT) && onlink < 1) {
-		struct rt6_info *rt;
-
-		rt = addrconf_get_prefix_route(&ifp->addr,
-					       ifp->prefix_len,
-					       ifp->idev->dev,
-					       0, RTF_GATEWAY | RTF_DEFAULT);
-
-		if (rt) {
-			if (onlink == 0) {
-				ip6_del_rt(rt);
-				rt = NULL;
-			} else if (!(rt->rt6i_flags & RTF_EXPIRES)) {
-				rt6_set_expires(rt, expires);
-			}
-		}
-		ip6_rt_put(rt);
+	if (action != CLEANUP_PREFIX_RT_NOP) {
+		cleanup_prefix_route(ifp, expires,
+			action == CLEANUP_PREFIX_RT_DEL);
 	}
 
 	/* clean up prefsrc entries */
@@ -3636,6 +3655,7 @@ static int inet6_addr_modify(struct inet6_ifaddr *ifp, u32 ifa_flags,
 	clock_t expires;
 	unsigned long timeout;
 	bool was_managetempaddr;
+	bool had_prefixroute;
 
 	if (!valid_lft || (prefered_lft > valid_lft))
 		return -EINVAL;
@@ -3664,6 +3684,8 @@ static int inet6_addr_modify(struct inet6_ifaddr *ifp, u32 ifa_flags,
 
 	spin_lock_bh(&ifp->lock);
 	was_managetempaddr = ifp->flags & IFA_F_MANAGETEMPADDR;
+	had_prefixroute = ifp->flags & IFA_F_PERMANENT &&
+			  !(ifp->flags & IFA_F_NOPREFIXROUTE);
 	ifp->flags &= ~(IFA_F_DEPRECATED | IFA_F_PERMANENT | IFA_F_NODAD |
 			IFA_F_HOMEADDRESS | IFA_F_MANAGETEMPADDR |
 			IFA_F_NOPREFIXROUTE);
@@ -3679,6 +3701,18 @@ static int inet6_addr_modify(struct inet6_ifaddr *ifp, u32 ifa_flags,
 	if (!(ifa_flags & IFA_F_NOPREFIXROUTE)) {
 		addrconf_prefix_route(&ifp->addr, ifp->prefix_len, ifp->idev->dev,
 				      expires, flags);
+	} else if (had_prefixroute) {
+		enum cleanup_prefix_rt_t action;
+		unsigned long rt_expires;
+
+		write_lock_bh(&ifp->idev->lock);
+		action = check_cleanup_prefix_route(ifp, &rt_expires);
+		write_unlock_bh(&ifp->idev->lock);
+
+		if (action != CLEANUP_PREFIX_RT_NOP) {
+			cleanup_prefix_route(ifp, rt_expires,
+				action == CLEANUP_PREFIX_RT_DEL);
+		}
 	}
 
 	if (was_managetempaddr || ifp->flags & IFA_F_MANAGETEMPADDR) {
-- 
1.8.4.2

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox