Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 2/3] net: ethernet: atheros: add CONFIG_PM_SLEEP to suspend/resume functions
From: Jingoo Han @ 2013-03-26  7:03 UTC (permalink / raw)
  To: 'David S. Miller'; +Cc: netdev, 'Jingoo Han'
In-Reply-To: <001b01ce29ef$f09cbf10$d1d63d30$%han@samsung.com>

Add CONFIG_PM_SLEEP to suspend/resume functions to fix the following
build warning when CONFIG_PM_SLEEP is not selected. This is because
sleep PM callbacks defined by SIMPLE_DEV_PM_OPS are only used when
the CONFIG_PM_SLEEP is enabled.

drivers/net/ethernet/atheros/atlx/atl1.c:2861:12: warning: 'atl1_resume' defined but not used [-Wunused-function]

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
---
 drivers/net/ethernet/atheros/atlx/atl1.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/atheros/atlx/atl1.c b/drivers/net/ethernet/atheros/atlx/atl1.c
index 5b0d993..9948fee 100644
--- a/drivers/net/ethernet/atheros/atlx/atl1.c
+++ b/drivers/net/ethernet/atheros/atlx/atl1.c
@@ -2774,7 +2774,7 @@ static int atl1_close(struct net_device *netdev)
 	return 0;
 }

-#ifdef CONFIG_PM
+#ifdef CONFIG_PM_SLEEP
 static int atl1_suspend(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
-- 
1.7.2.5

^ permalink raw reply related

* [PATCH 3/3] net: wireless: iwlegacy: add CONFIG_PM_SLEEP to suspend/resume functions
From: Jingoo Han @ 2013-03-26  7:03 UTC (permalink / raw)
  To: 'David S. Miller'; +Cc: netdev, 'Jingoo Han'
In-Reply-To: <001b01ce29ef$f09cbf10$d1d63d30$%han@samsung.com>

Add CONFIG_PM_SLEEP to suspend/resume functions to fix the following
build warning when CONFIG_PM_SLEEP is not selected. This is because
sleep PM callbacks defined by SIMPLE_DEV_PM_OPS are only used when
the CONFIG_PM_SLEEP is enabled.

drivers/net/wireless/iwlegacy/common.c:4894:1: warning: 'il_pci_suspend' defined but not used [-Wunused-function]
drivers/net/wireless/iwlegacy/common.c:4912:1: warning: 'il_pci_resume' defined but not used [-Wunused-function]

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
---
 drivers/net/wireless/iwlegacy/common.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/iwlegacy/common.c b/drivers/net/wireless/iwlegacy/common.c
index 01e2d2b..5b79819 100644
--- a/drivers/net/wireless/iwlegacy/common.c
+++ b/drivers/net/wireless/iwlegacy/common.c
@@ -4888,7 +4888,7 @@ il_add_beacon_time(struct il_priv *il, u32 base, u32 addon,
 }
 EXPORT_SYMBOL(il_add_beacon_time);

-#ifdef CONFIG_PM
+#ifdef CONFIG_PM_SLEEP

 static int
 il_pci_suspend(struct device *device)
@@ -4939,7 +4939,7 @@ il_pci_resume(struct device *device)
 SIMPLE_DEV_PM_OPS(il_pm_ops, il_pci_suspend, il_pci_resume);
 EXPORT_SYMBOL(il_pm_ops);

-#endif /* CONFIG_PM */
+#endif /* CONFIG_PM_SLEEP */

 static void
 il_update_qos(struct il_priv *il)
-- 
1.7.2.5

^ permalink raw reply related

* [PATCH] pch_gbe: fix ip_summed checksum reporting on rx
From: Veaceslav Falico @ 2013-03-26  8:26 UTC (permalink / raw)
  To: netdev; +Cc: richardcochran, tshimizu818, vfalico, andy.cress

skb->ip_summed should be CHECKSUM_UNNECESSARY when the driver reports that
checksums were correct and CHECKSUM_NONE in any other case. They're
currently placed vice versa, which breaks the forwarding scenario. Fix it
by placing them as described above.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---
 .../net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c   |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
index 39ab4d0..73ce7dd 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
@@ -1726,9 +1726,9 @@ pch_gbe_clean_rx(struct pch_gbe_adapter *adapter,
 
 			skb->protocol = eth_type_trans(skb, netdev);
 			if (tcp_ip_status & PCH_GBE_RXD_ACC_STAT_TCPIPOK)
-				skb->ip_summed = CHECKSUM_NONE;
-			else
 				skb->ip_summed = CHECKSUM_UNNECESSARY;
+			else
+				skb->ip_summed = CHECKSUM_NONE;
 
 			napi_gro_receive(&adapter->napi, skb);
 			(*work_done)++;
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH 1/2] man/send(2): add EPERM to the list of possible errors
From: Fernando Luis Vazquez Cao @ 2013-03-26  8:37 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: linux-man-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	netfilter-devel-u79uwXL29TY76Z2rM5mHXA, Pablo Neira Ayuso,
	Patrick McHardy, Hirotaka Sasaki
In-Reply-To: <1363675513.4767.6.camel@nexus>

Hi Michael,

Do you see any problem with these two patches?

Thanks,
Fernando


On 2013/03/19 15:45, Fernando Luis Vázquez Cao wrote:
> Subject: [PATCH 1/2] man/send(2): add EPERM to the list of possible errors
>
> System policy (for example netfilter rule) can cause a send* operation
> to fail with EPERM.
>
> Reported-by: Hirotaka Sasaki <sasaki.hirotaka-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando-gVGce1chcLdL9jVzuh4AOg@public.gmane.org>
> ---
>
> diff -urNp man-pages-3.50-orig/man2/send.2 man-pages-3.50/man2/send.2
> --- man-pages-3.50-orig/man2/send.2	2013-03-15 16:17:32.000000000 +0900
> +++ man-pages-3.50/man2/send.2	2013-03-19 15:17:03.616008275 +0900
> @@ -357,6 +357,10 @@ Some bit in the
>   .I flags
>   argument is inappropriate for the socket type.
>   .TP
> +.B EPERM
> +System policy (for example a netfilter rule) does not permit the requested
> +operation.
> +.TP
>   .B EPIPE
>   The local end has been shut down on a connection oriented socket.
>   In this case the process
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 net-next] netlink: have length check of rtnl msg before deref
From: Thomas Graf @ 2013-03-26  8:59 UTC (permalink / raw)
  To: Hong Zhiguo; +Cc: netdev, davem, stephen
In-Reply-To: <1364268993-19947-1-git-send-email-honkiko@gmail.com>

On 03/26/13 at 11:36am, Hong Zhiguo wrote:
> When the legacy array rtm_min still exists, the length check within
> these functions is covered by rtm_min[RTM_NEWTFILTER],
> rtm_min[RTM_NEWQDISC] and rtm_min[RTM_NEWTCLASS].
> 
> But after Thomas Graf removed rtm_min several days ago, these checks
> are missing. Other doit functions should be OK.
> 
> Signed-off-by: Hong Zhiguo <honkiko@gmail.com>

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply

* RE: [Xen-devel] [PATCH 5/6] xen-netback: coalesce slots before copying
From: Paul Durrant @ 2013-03-26  9:16 UTC (permalink / raw)
  To: Wei Liu, David Vrabel
  Cc: Ian Campbell, konrad.wilk@oracle.com, netdev@vger.kernel.org,
	Wei Liu, xen-devel@lists.xen.org, annie.li@oracle.com
In-Reply-To: <20130325190911.GC7004@zion.uk.xensource.com>

> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> bounces@lists.xen.org] On Behalf Of Wei Liu
> Sent: 25 March 2013 19:09
> To: David Vrabel
> Cc: Ian Campbell; konrad.wilk@oracle.com; netdev@vger.kernel.org; Wei
> Liu; xen-devel@lists.xen.org; annie.li@oracle.com
> Subject: Re: [Xen-devel] [PATCH 5/6] xen-netback: coalesce slots before
> copying
> 
> On Mon, Mar 25, 2013 at 06:29:29PM +0000, David Vrabel wrote:
> > >>
> > >
> > > Are you suggesting move the default macro value to header file? It is
> > > just an estimation, I have no knowledge of the accurate maximum value,
> > > so I think make it part of the protocol a bad idea.
> >
> > How is the author of a new frontend supposed to know how many slots
> they
> > can use per packet if it is not precisely defined?
> >
> 
> A new frontend shuold use the scheme you mentioned below to get the
> maximum value. For old frontends that cannot be fixed, administrator can
> configure max_skb_slots to accommodate their need.
> 
> > > Do you have a handle on the maximum value?
> >
> > Backends should provide the value to the frontend via a xenstore key
> > (e.g., max-slots-per-frame).  This value should be at least 18 (the
> > historical value of MAX_SKB_FRAGS).
> >
> > The frontend may use up to this specified value or 17 if the
> > max-slots-per-frame key is missing.
> >
> > Supporting at least 18 in the backend is required for existing
> > frontends.  Limiting frontends to 17 allows them to work with all
> > backends (including recent Linux version that only supported 17).
> >
> > It's not clear why 19 or 20 were suggested as possible values.  I
> > checked back to 2.6.18 and MAX_SKB_FRAGS there is (65536/PAGE_SIZE +
> 2)
> 
> Because the check is >= MAX_SKB_FRAGS originally and James Harper told
> me that "Windows stops counting on 20".
> 

For the Citrix PV drivers I lifted the #define of MAX_SKB_FRAGS from the dom0 kernel (i.e. 18). If a packet coming from the stack has more than that number of fragments then it's copied and coalesced. The value advertised for TSO size is chosen such that a maximally sized TSO will always fit in 18 fragments after coalescing but (since this is Windows) the drivers don't trust the stack to stick to that limit and will drop a packet if it won't fit.

It seems reasonable that, since the backend is copying anyway, that it should handle any fragment list coming from the frontend that it can. This would allow the copy-and-coalesce code to be removed from the frontend (and the double-copy avoided). If there is a maximum backend packet size though then I think this needs to be advertised to the frontend. The backend should clearly bin packets coming from the frontend that exceed that limit but advertising that limit in xenstore allows the frontend to choose the right TSO maximum size to advertise to its stack, rather than having to make it based on some historical value that actually has little meaning (in the absence of grant mapping).

  Paul

^ permalink raw reply

* Re: [PATCH regression/bisected] Revert "brcmsmac: support 4313iPA"
From: Piotr Haber @ 2013-03-26  9:31 UTC (permalink / raw)
  To: John W. Linville
  Cc: David Herrmann, linux-wireless, linux-kernel, Arend van Spriel,
	brcm80211-dev-list, netdev, Pieter-Paul Giesberts
In-Reply-To: <20130325185821.GB17454@tuxdriver.com>

On 03/25/13 19:58, John W. Linville wrote:
> On Mon, Mar 18, 2013 at 02:58:08PM +0100, David Herrmann wrote:
>> Hi Piotr
>>
>> On Mon, Mar 18, 2013 at 2:49 PM, Piotr Haber <phaber@broadcom.com> wrote:
>>> On 03/18/13 11:45, David Herrmann wrote:
>>>> This reverts commit b6fc28a158076ca2764edc9a6d1e1402f56e1c0c. It breaks
>>>> wireless AP reconnection on: (14e4:4727)
>>>>   Broadcom Corporation BCM4313 802.11b/g/n Wireless LAN Controller
>>>>
>>>> Any attempt to reconnect to an AP results in timeouts no matter how near to the
>>>> AP I am:
>>>>  00:10:40 $nb kernel: wlan0: authenticate with 00:18:39:0a:8e:23
>>>>  00:10:40 $nb kernel: wlan0: direct probe to 00:18:39:0a:8e:23 (try 1/3)
>>>>  00:10:40 $nb kernel: wlan0: direct probe to 00:18:39:0a:8e:23 (try 2/3)
>>>>  00:10:41 $nb kernel: wlan0: direct probe to 00:18:39:0a:8e:23 (try 3/3)
>>>>  00:10:41 $nb kernel: wlan0: authentication with 00:18:39:0a:8e:23 timed out
>>>> ---
>>>> Hi
>>>>
>>>> I tried coming up with a fix instead of reverting this commit, but the commit is
>>>> way to big for me to understand what's going on. Sorry.
>>>>
>>>> With linux-3.8 connecting to an AP broke on my machine. I could connect to an AP
>>>> one time, but any further attempt resulted in:
>>>>  00:10:40 $nb kernel: wlan0: authenticate with 00:18:39:0a:8e:23
>>>>  00:10:40 $nb kernel: wlan0: direct probe to 00:18:39:0a:8e:23 (try 1/3)
>>>>  00:10:40 $nb kernel: wlan0: direct probe to 00:18:39:0a:8e:23 (try 2/3)
>>>>  00:10:41 $nb kernel: wlan0: direct probe to 00:18:39:0a:8e:23 (try 3/3)
>>>>  00:10:41 $nb kernel: wlan0: authentication with 00:18:39:0a:8e:23 timed out
>>>>
>>>> Even sitting right next to the AP didn't help so I started bisecting and it
>>>> turned out to be:
>>>>   "brcmsmac: support 4313iPA" b6fc28a158076ca2764edc9a6d1e1402f56e1c0c
>>>> Please revert it.
>>>>
>>>> Thanks
>>>> David
>>>>
>>> Hi,
>>> unfortunately this is not a first report of this patch breaking 4313 for some users.
>>> I'm pretty confident that it is hardware revision related as we have 4313ePA and iPA boards running
>>> successfully in our test setup.
>>> Could you aid us in effort of finding the problem by supplying the contents of this debugfs file:
>>> <debugfs_mount>/brcmsmac/bcma0:0/hardware
>>
>> Hi
>>
>> $ cat /sys/kernel/debug/brcmsmac/bcma0\:0/hardware
>> board vendor: 185f
>> board type: 51a
>> board revision: 1408
>> board flags: 8402a01
>> board flags2: 880
>> firmware revision: 262032b
>>
>> I can also try partial reverts of that commit, but I really don't know
>> which parts might be important.
> 
> Are we going to see a fix for this (very) soon?  Or should I just go
> ahead and revert this patch?
> 
I cannot reproduce the issue on a set of devices we have here (3 different 4313 ePA models).
Some of the devices that are reported to be broken are being shipped to us.
So I would say we need around 2 weeks at least to resolve this (if we reproduce the problem and find
a fix).
Not sure this is soon enough.
If not please go ahead an revert the patch.

^ permalink raw reply

* Re: [PATCH net-next] ptp: increase the maximum number of clocks
From: Jiri Benc @ 2013-03-26  9:48 UTC (permalink / raw)
  To: Richard Cochran; +Cc: David Miller, netdev, Ben Hutchings
In-Reply-To: <20130325185120.GB3100@netboy.at.omicron.at>

On Mon, 25 Mar 2013 19:51:20 +0100, Richard Cochran wrote:
> While that might work for the PTP ioctls, it won't for the main
> clock_gettime/settime/adjtime call. As you wrote below, these take a
> dynamic clockid_t (which comes from a file descriptor), and so each
> device needs its own, unique file system node.

They need an unique file descriptor, not an unique file. There's nothing
preventing you from opening the same file multiple times, using each
obtained file descriptor to access a different PHC. Sure, there is
currently no place to store a per-fd data in posix-clock implementation
but that could be added.

> For better or worse, we are stuck with char devs, unless we want to
> redo the clock_xyz API. I certainly don't want to. We can have over a
> million PTP clocks on a single major number. That should be enough.
> 
> IIRC, it is possible to grow the range of minor numbers
> incrementally. Perhaps you could take a look at that?

That was what I was originally thinking about. But I don't like it much
- the overhead will increase significantly with increasing number of
devices, as the chrdev allocations for a given major are kept in a
linked list which has to be completely traversed on each allocation.
We'd also need to replace the ptp_clocks_map with a different data
structure (e.g. a linked list of bitmaps if we allocate minors in
chunks to alleviate the chrdev allocation overhead), not helping the
overhead.

Please correct me if I misunderstood something.

Thanks for the ideas,

 Jiri

-- 
Jiri Benc

^ permalink raw reply

* Re: [Xen-devel] [PATCH 5/6] xen-netback: coalesce slots before copying
From: Wei Liu @ 2013-03-26  9:51 UTC (permalink / raw)
  To: Paul Durrant
  Cc: David Vrabel, Ian Campbell, konrad.wilk@oracle.com,
	netdev@vger.kernel.org, Wei Liu, xen-devel@lists.xen.org,
	annie.li@oracle.com
In-Reply-To: <291EDFCB1E9E224A99088639C4762022013F7D8E0DC6@LONPMAILBOX01.citrite.net>

On Tue, Mar 26, 2013 at 09:16:10AM +0000, Paul Durrant wrote:
> > >
> > > It's not clear why 19 or 20 were suggested as possible values.  I
> > > checked back to 2.6.18 and MAX_SKB_FRAGS there is (65536/PAGE_SIZE
> > > +
> > 2)
> > 
> > Because the check is >= MAX_SKB_FRAGS originally and James Harper
> > told me that "Windows stops counting on 20".
> > 
> 
> For the Citrix PV drivers I lifted the #define of MAX_SKB_FRAGS from
> the dom0 kernel (i.e. 18). If a packet coming from the stack has more

That means Citrix PV drivers cannot run on a vanilla kernel backend
because upstream MAX_SKB_FRAGS now is 17. The scope of the patch is to
fix this sort of problems without touching frontend because it is
impossible for us to fix every frontend in the world.

> than that number of fragments then it's copied and coalesced. The
> value advertised for TSO size is chosen such that a maximally sized
> TSO will always fit in 18 fragments after coalescing but (since this
> is Windows) the drivers don't trust the stack to stick to that limit
> and will drop a packet if it won't fit.
> 
> It seems reasonable that, since the backend is copying anyway, that it
> should handle any fragment list coming from the frontend that it can.
> This would allow the copy-and-coalesce code to be removed from the
> frontend (and the double-copy avoided). If there is a maximum backend
> packet size though then I think this needs to be advertised to the
> frontend. The backend should clearly bin packets coming from the
> frontend that exceed that limit but advertising that limit in xenstore
> allows the frontend to choose the right TSO maximum size to advertise
> to its stack, rather than having to make it based on some historical
> value that actually has little meaning (in the absence of grant
> mapping).
> 

The historical is choosen based on the idea that it should cope with all
known frontend limits and leave a bit more space for frontends that
slightly cross the line. Advertising through Xenstore is the future plan
as it cannot be done without modifying frontend.


Wei.

^ permalink raw reply

* Re: [PATCH net-next 2/2] net/davinci_emac: use clk_{prepare|unprepare}
From: Sekhar Nori @ 2013-03-26 10:05 UTC (permalink / raw)
  To: Mike Turquette
  Cc: David S. Miller, netdev, davinci-linux-open-source,
	Rafael J. Wysocki
In-Reply-To: <20130325161654.4014.68401@quantum>



On 3/25/2013 9:46 PM, Mike Turquette wrote:
> Quoting Sekhar Nori (2013-03-25 02:25:47)
>> Use clk_prepare()/clk_unprepare() in the driver since common
>> clock framework needs these to be called before clock is enabled.
>>
>> This is in preparation of common clock framework migration
>> for DaVinci.
>>
>> Cc: Mike Turquette <mturquette@linaro.org>
>> Signed-off-by: Sekhar Nori <nsekhar@ti.com>
>> ---
>>  drivers/net/ethernet/ti/davinci_emac.c |   19 +++++++++++++++++--
>>  1 file changed, 17 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
>> index b1349b5..436296c 100644
>> --- a/drivers/net/ethernet/ti/davinci_emac.c
>> +++ b/drivers/net/ethernet/ti/davinci_emac.c
>> @@ -350,6 +350,7 @@ struct emac_priv {
>>         /*platform specific members*/
>>         void (*int_enable) (void);
>>         void (*int_disable) (void);
>> +       struct clk *clk;
>>  };
>>  
>>  /* EMAC TX Host Error description strings */
>> @@ -1870,19 +1871,29 @@ static int davinci_emac_probe(struct platform_device *pdev)
>>                 dev_err(&pdev->dev, "failed to get EMAC clock\n");
>>                 return -EBUSY;
>>         }
>> +
>> +       rc = clk_prepare(emac_clk);
>> +       if (rc) {
>> +               dev_err(&pdev->dev, "emac clock prepare failed.\n");
>> +               return rc;
>> +       }
>> +
> 
> Is clk_enable ever called for emac_clk here?  I don't see it in the
> diff.  clk_prepare and clk_enable should usually be paired, even if
> simply calling clk_prepare generates the clock signal that your device
> needs.

clk_enable() is called by pm_runtime framework. Without this patch, the 
clk_enable() call from pm_clk_resume() will result in a warning when 
using common clock framework. This issue can actually be fixed by patch
below which fixes this in drivers/base/power/clock_ops.c so 
individual drivers don't have to do this, but we need to careful since 
will break for callers who intend to use the pm_runtime apis from atomic 
context.

Anyway, its probably much better to fix this in pm_runtime framework so I will 
work with Rafael to see if I can come up with something acceptable.

This patch can then be dropped for now, but 1/2 can still be applied.
That one is pretty harmless!

Thanks,
Sekhar

diff --git a/drivers/base/power/clock_ops.c b/drivers/base/power/clock_ops.c
index 9d8fde7..60d389a 100644
--- a/drivers/base/power/clock_ops.c
+++ b/drivers/base/power/clock_ops.c
@@ -230,7 +230,7 @@ int pm_clk_suspend(struct device *dev)
        list_for_each_entry_reverse(ce, &psd->clock_list, node) {
                if (ce->status < PCE_STATUS_ERROR) {
                        if (ce->status == PCE_STATUS_ENABLED)
-                               clk_disable(ce->clk);
+                               clk_disable_unprepare(ce->clk);
                        ce->status = PCE_STATUS_ACQUIRED;
                }
        }
@@ -259,7 +259,7 @@ int pm_clk_resume(struct device *dev)

        list_for_each_entry(ce, &psd->clock_list, node) {
                if (ce->status < PCE_STATUS_ERROR) {
-                       clk_enable(ce->clk);
+                       clk_prepare_enable(ce->clk);
                        ce->status = PCE_STATUS_ENABLED;
                }
        }

^ permalink raw reply related

* [net 0/6][pull request] Intel Wired LAN Driver Updates
From: Jeff Kirsher @ 2013-03-26 10:42 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, gospo, sassmann

This series contains updates to ixgbevf and igb.

The ixgbevf calls to pci_disable_msix() and to free the msix_entries
memory should not occur if device open fails.  Instead they should be
called during device driver removal to balance with the call to
pci_enable_msix() and the call to allocate msix_entries memory
during the device probe and driver load.

The remaining 4 of 5 igb patches are simple 1-3 line patches to fix
several issues such as possible null pointer dereference, PHC stopping
on max frequency, make sensor info static and SR-IOV initialization
reordering.

The remaining igb patch to fix anti-spoofing config fixes a problem
in i350 where anti spoofing configuration was written into a wrong
register.

The following are changes since commit a79ca223e029aa4f09abb337accf1812c900a800:
  ipv6: fix bad free of addrconf_init_net
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net master

Alex Williamson (2):
  igb: Fix null pointer dereference
  igb: SR-IOV init reordering

Jiri Benc (1):
  igb: fix PHC stopping on max freq

Lior Levy (1):
  igb: fix i350 anti spoofing config

Stephen Hemminger (1):
  igb: make sensor info static

xunleer (1):
  ixgbevf: don't release the soft entries

 drivers/net/ethernet/intel/igb/e1000_82575.c      | 33 +++++++++++++----------
 drivers/net/ethernet/intel/igb/igb_hwmon.c        |  2 +-
 drivers/net/ethernet/intel/igb/igb_main.c         |  4 +--
 drivers/net/ethernet/intel/igb/igb_ptp.c          |  2 +-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 24 ++++++++++++++---
 5 files changed, 43 insertions(+), 22 deletions(-)

-- 
1.7.11.7

^ permalink raw reply

* [net 1/6] ixgbevf: don't release the soft entries
From: Jeff Kirsher @ 2013-03-26 10:42 UTC (permalink / raw)
  To: davem; +Cc: xunleer, netdev, gospo, sassmann, Greg Rose, Jeff Kirsher
In-Reply-To: <1364294554-7967-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: xunleer <xunleer.li@huawei.com>

When the ixgbevf driver is opened the request to allocate MSIX irq
vectors may fail.  In that case the driver will call ixgbevf_down()
which will call ixgbevf_irq_disable() to clear the HW interrupt
registers and calls synchronize_irq() using the msix_entries pointer in
the adapter structure.  However, when the function to request the MSIX
irq vectors failed it had already freed the msix_entries which causes
an OOPs from using the NULL pointer in synchronize_irq().

The calls to pci_disable_msix() and to free the msix_entries memory
should not occur if device open fails.  Instead they should be called
during device driver removal to balance with the call to
pci_enable_msix() and the call to allocate msix_entries memory
during the device probe and driver load.

Signed-off-by: Li Xun <xunleer.li@huawei.com>
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 24 +++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index c3db6cd..2b6cb5c 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -944,9 +944,17 @@ free_queue_irqs:
 		free_irq(adapter->msix_entries[vector].vector,
 			 adapter->q_vector[vector]);
 	}
-	pci_disable_msix(adapter->pdev);
-	kfree(adapter->msix_entries);
-	adapter->msix_entries = NULL;
+	/* This failure is non-recoverable - it indicates the system is
+	 * out of MSIX vector resources and the VF driver cannot run
+	 * without them.  Set the number of msix vectors to zero
+	 * indicating that not enough can be allocated.  The error
+	 * will be returned to the user indicating device open failed.
+	 * Any further attempts to force the driver to open will also
+	 * fail.  The only way to recover is to unload the driver and
+	 * reload it again.  If the system has recovered some MSIX
+	 * vectors then it may succeed.
+	 */
+	adapter->num_msix_vectors = 0;
 	return err;
 }
 
@@ -2572,6 +2580,15 @@ static int ixgbevf_open(struct net_device *netdev)
 	struct ixgbe_hw *hw = &adapter->hw;
 	int err;
 
+	/* A previous failure to open the device because of a lack of
+	 * available MSIX vector resources may have reset the number
+	 * of msix vectors variable to zero.  The only way to recover
+	 * is to unload/reload the driver and hope that the system has
+	 * been able to recover some MSIX vector resources.
+	 */
+	if (!adapter->num_msix_vectors)
+		return -ENOMEM;
+
 	/* disallow open during test */
 	if (test_bit(__IXGBEVF_TESTING, &adapter->state))
 		return -EBUSY;
@@ -2628,7 +2645,6 @@ static int ixgbevf_open(struct net_device *netdev)
 
 err_req_irq:
 	ixgbevf_down(adapter);
-	ixgbevf_free_irq(adapter);
 err_setup_rx:
 	ixgbevf_free_all_rx_resources(adapter);
 err_setup_tx:
-- 
1.7.11.7

^ permalink raw reply related

* [net 2/6] igb: fix i350 anti spoofing config
From: Jeff Kirsher @ 2013-03-26 10:42 UTC (permalink / raw)
  To: davem; +Cc: Lior Levy, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1364294554-7967-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Lior Levy <lior.levy@intel.com>

Fix a problem in i350 where anti spoofing configuration was written into a
wrong register.

Signed-off-by: Lior Levy <lior.levy@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/e1000_82575.c | 33 ++++++++++++++++------------
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/e1000_82575.c b/drivers/net/ethernet/intel/igb/e1000_82575.c
index b64542a..12b1d84 100644
--- a/drivers/net/ethernet/intel/igb/e1000_82575.c
+++ b/drivers/net/ethernet/intel/igb/e1000_82575.c
@@ -1818,27 +1818,32 @@ out:
  **/
 void igb_vmdq_set_anti_spoofing_pf(struct e1000_hw *hw, bool enable, int pf)
 {
-	u32 dtxswc;
+	u32 reg_val, reg_offset;
 
 	switch (hw->mac.type) {
 	case e1000_82576:
+		reg_offset = E1000_DTXSWC;
+		break;
 	case e1000_i350:
-		dtxswc = rd32(E1000_DTXSWC);
-		if (enable) {
-			dtxswc |= (E1000_DTXSWC_MAC_SPOOF_MASK |
-				   E1000_DTXSWC_VLAN_SPOOF_MASK);
-			/* The PF can spoof - it has to in order to
-			 * support emulation mode NICs */
-			dtxswc ^= (1 << pf | 1 << (pf + MAX_NUM_VFS));
-		} else {
-			dtxswc &= ~(E1000_DTXSWC_MAC_SPOOF_MASK |
-				    E1000_DTXSWC_VLAN_SPOOF_MASK);
-		}
-		wr32(E1000_DTXSWC, dtxswc);
+		reg_offset = E1000_TXSWC;
 		break;
 	default:
-		break;
+		return;
+	}
+
+	reg_val = rd32(reg_offset);
+	if (enable) {
+		reg_val |= (E1000_DTXSWC_MAC_SPOOF_MASK |
+			     E1000_DTXSWC_VLAN_SPOOF_MASK);
+		/* The PF can spoof - it has to in order to
+		 * support emulation mode NICs
+		 */
+		reg_val ^= (1 << pf | 1 << (pf + MAX_NUM_VFS));
+	} else {
+		reg_val &= ~(E1000_DTXSWC_MAC_SPOOF_MASK |
+			     E1000_DTXSWC_VLAN_SPOOF_MASK);
 	}
+	wr32(reg_offset, reg_val);
 }
 
 /**
-- 
1.7.11.7

^ permalink raw reply related

* [net 5/6] igb: make sensor info static
From: Jeff Kirsher @ 2013-03-26 10:42 UTC (permalink / raw)
  To: davem; +Cc: Stephen Hemminger, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1364294554-7967-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Stephen Hemminger <stephen@networkplumber.org>

Trivial sparse warning.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_hwmon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_hwmon.c b/drivers/net/ethernet/intel/igb/igb_hwmon.c
index 4623502..0478a1a 100644
--- a/drivers/net/ethernet/intel/igb/igb_hwmon.c
+++ b/drivers/net/ethernet/intel/igb/igb_hwmon.c
@@ -39,7 +39,7 @@
 #include <linux/pci.h>
 
 #ifdef CONFIG_IGB_HWMON
-struct i2c_board_info i350_sensor_info = {
+static struct i2c_board_info i350_sensor_info = {
 	I2C_BOARD_INFO("i350bb", (0Xf8 >> 1)),
 };
 
-- 
1.7.11.7

^ permalink raw reply related

* [net 3/6] igb: Fix null pointer dereference
From: Jeff Kirsher @ 2013-03-26 10:42 UTC (permalink / raw)
  To: davem; +Cc: Alex Williamson, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1364294554-7967-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alex Williamson <alex.williamson@redhat.com>

The max_vfs= option has always been self limiting to the number of VFs
supported by the device.  fa44f2f1 added SR-IOV configuration via
sysfs, but in the process broke this self correction factor.  The
failing path is:

igb_probe
  igb_sw_init
    if (max_vfs > 7) {
        adapter->vfs_allocated_count = 7;
    ...
    igb_probe_vfs
    igb_enable_sriov(, max_vfs)
      if (num_vfs > 7) {
        err = -EPERM;
        ...

This leaves vfs_allocated_count = 7 and vf_data = NULL, so we bomb out
when igb_probe finally calls igb_reset.  It seems like a really bad
idea, and somewhat pointless, to set vfs_allocated_count separate from
vf_data, but limiting max_vfs is enough to avoid the null pointer.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 4dbd629..2ae8886 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2652,7 +2652,7 @@ static int igb_sw_init(struct igb_adapter *adapter)
 		if (max_vfs > 7) {
 			dev_warn(&pdev->dev,
 				 "Maximum of 7 VFs per PF, using max\n");
-			adapter->vfs_allocated_count = 7;
+			max_vfs = adapter->vfs_allocated_count = 7;
 		} else
 			adapter->vfs_allocated_count = max_vfs;
 		if (adapter->vfs_allocated_count)
-- 
1.7.11.7

^ permalink raw reply related

* [net 4/6] igb: SR-IOV init reordering
From: Jeff Kirsher @ 2013-03-26 10:42 UTC (permalink / raw)
  To: davem; +Cc: Alex Williamson, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1364294554-7967-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alex Williamson <alex.williamson@redhat.com>

igb is ineffective at setting a lower total VFs because:

int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs)
{
        ...
        /* Shouldn't change if VFs already enabled */
        if (dev->sriov->ctrl & PCI_SRIOV_CTRL_VFE)
                return -EBUSY;

Swap init ordering.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 2ae8886..8496adf 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2542,8 +2542,8 @@ static void igb_probe_vfs(struct igb_adapter *adapter)
 	if ((hw->mac.type == e1000_i210) || (hw->mac.type == e1000_i211))
 		return;
 
-	igb_enable_sriov(pdev, max_vfs);
 	pci_sriov_set_totalvfs(pdev, 7);
+	igb_enable_sriov(pdev, max_vfs);
 
 #endif /* CONFIG_PCI_IOV */
 }
-- 
1.7.11.7

^ permalink raw reply related

* [net 6/6] igb: fix PHC stopping on max freq
From: Jeff Kirsher @ 2013-03-26 10:42 UTC (permalink / raw)
  To: davem; +Cc: Jiri Benc, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1364294554-7967-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jiri Benc <jbenc@redhat.com>

For 82576 MAC type, max_adj is reported as 1000000000 ppb. However, if
this value is passed to igb_ptp_adjfreq_82576, incvalue overflows out of
INCVALUE_82576_MASK, resulting in setting of zero TIMINCA.incvalue, stopping
the PHC (instead of going at twice the nominal speed).

Fix the advertised max_adj value to the largest value hardware can handle.
As there is no min_adj value available (-max_adj is used instead), this will
also prevent stopping the clock intentionally. It's probably not a big deal,
other igb MAC types don't support stopping the clock, either.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_ptp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_ptp.c b/drivers/net/ethernet/intel/igb/igb_ptp.c
index 0987822..0a23750 100644
--- a/drivers/net/ethernet/intel/igb/igb_ptp.c
+++ b/drivers/net/ethernet/intel/igb/igb_ptp.c
@@ -740,7 +740,7 @@ void igb_ptp_init(struct igb_adapter *adapter)
 	case e1000_82576:
 		snprintf(adapter->ptp_caps.name, 16, "%pm", netdev->dev_addr);
 		adapter->ptp_caps.owner = THIS_MODULE;
-		adapter->ptp_caps.max_adj = 1000000000;
+		adapter->ptp_caps.max_adj = 999999881;
 		adapter->ptp_caps.n_ext_ts = 0;
 		adapter->ptp_caps.pps = 0;
 		adapter->ptp_caps.adjfreq = igb_ptp_adjfreq_82576;
-- 
1.7.11.7

^ permalink raw reply related

* Re: [PATCH 1/2] man/send(2): add EPERM to the list of possible errors
From: Pablo Neira Ayuso @ 2013-03-26 10:48 UTC (permalink / raw)
  To: Fernando Luis Vazquez Cao
  Cc: Michael Kerrisk, linux-man, netdev, netfilter-devel,
	Patrick McHardy, Hirotaka Sasaki
In-Reply-To: <51515E5E.7020309@lab.ntt.co.jp>

On Tue, Mar 26, 2013 at 05:37:50PM +0900, Fernando Luis Vazquez Cao wrote:
> Hi Michael,
> 
> Do you see any problem with these two patches?

Please, hold on with the second patch.

I'd like to find a possible solution for the EPERM problem that we've
been discussing. It requires some rework and performance evaluation.

^ permalink raw reply

* RE: [Xen-devel] [PATCH 5/6] xen-netback: coalesce slots before copying
From: James Harper @ 2013-03-26 10:52 UTC (permalink / raw)
  To: Wei Liu, David Vrabel
  Cc: Ian Campbell, konrad.wilk@oracle.com, netdev@vger.kernel.org,
	Wei Liu, xen-devel@lists.xen.org, annie.li@oracle.com
In-Reply-To: <20130325190911.GC7004@zion.uk.xensource.com>

> > It's not clear why 19 or 20 were suggested as possible values.  I
> > checked back to 2.6.18 and MAX_SKB_FRAGS there is (65536/PAGE_SIZE +
> 2)
> 
> Because the check is >= MAX_SKB_FRAGS originally and James Harper told
> me that "Windows stops counting on 20".
> 

I've obviously not been clear enough here... GPLPV stopped counting at 20 (only needed to know if <20 or not). Windows itself can submit a packet to NDIS with hundreds of buffers. It doesn't really matter if it's 21 or 1021, I just didn't want to be misquoted.

James

^ permalink raw reply

* RE: [Xen-devel] [PATCH 5/6] xen-netback: coalesce slots before copying
From: James Harper @ 2013-03-26 11:00 UTC (permalink / raw)
  To: Paul Durrant, Wei Liu, David Vrabel
  Cc: Ian Campbell, Wei Liu, netdev@vger.kernel.org,
	konrad.wilk@oracle.com, xen-devel@lists.xen.org,
	annie.li@oracle.com
In-Reply-To: <291EDFCB1E9E224A99088639C4762022013F7D8E0DC6@LONPMAILBOX01.citrite.net>

> > Because the check is >= MAX_SKB_FRAGS originally and James Harper told
> > me that "Windows stops counting on 20".
> >
> 
> For the Citrix PV drivers I lifted the #define of MAX_SKB_FRAGS from the
> dom0 kernel (i.e. 18). If a packet coming from the stack has more than that
> number of fragments then it's copied and coalesced. The value advertised
> for TSO size is chosen such that a maximally sized TSO will always fit in 18
> fragments after coalescing but (since this is Windows) the drivers don't trust
> the stack to stick to that limit and will drop a packet if it won't fit.
> 
> It seems reasonable that, since the backend is copying anyway, that it should
> handle any fragment list coming from the frontend that it can. This would
> allow the copy-and-coalesce code to be removed from the frontend (and the
> double-copy avoided). If there is a maximum backend packet size though
> then I think this needs to be advertised to the frontend. The backend should
> clearly bin packets coming from the frontend that exceed that limit but
> advertising that limit in xenstore allows the frontend to choose the right TSO
> maximum size to advertise to its stack, rather than having to make it based
> on some historical value that actually has little meaning (in the absence of
> grant mapping).
> 

As stated previously, I've observed windows issuing staggering numbers of buffers to NDIS miniport drivers, so you will need to coalesce in a windows driver anyway. I'm not sure what the break even point is but I think it's safe to say that in the choice between using 1000 (worst case) ring slots (with the resulting mapping overheads) and coalescing in the frontend, coalescing is going to be the better option.

James

^ permalink raw reply

* Re: [Xen-devel] [PATCH 5/6] xen-netback: coalesce slots before copying
From: David Vrabel @ 2013-03-26 11:06 UTC (permalink / raw)
  To: James Harper
  Cc: Wei Liu, Ian Campbell, konrad.wilk@oracle.com,
	netdev@vger.kernel.org, Wei Liu, xen-devel@lists.xen.org,
	annie.li@oracle.com
In-Reply-To: <6035A0D088A63A46850C3988ED045A4B3880B43D@BITCOM1.int.sbss.com.au>

On 26/03/13 10:52, James Harper wrote:
>>> It's not clear why 19 or 20 were suggested as possible values.
>>> I checked back to 2.6.18 and MAX_SKB_FRAGS there is
>>> (65536/PAGE_SIZE + 2)
>> 
>> Because the check is >= MAX_SKB_FRAGS originally and James Harper
>> told me that "Windows stops counting on 20".
>> 
> 
> I've obviously not been clear enough here... GPLPV stopped counting
> at 20 (only needed to know if <20 or not). Windows itself can submit
> a packet to NDIS with hundreds of buffers. It doesn't really matter
> if it's 21 or 1021, I just didn't want to be misquoted.

This still isn't clear.  What's the maximum number of ring entries that
GPLPV driver will use per packet?  Are you saying it's 20?  If so how
has the GPLPV driver ever worked well with Linux's netback (with its
historical limit of 18)?

David

^ permalink raw reply

* Re: [Xen-devel] [PATCH 5/6] xen-netback: coalesce slots before copying
From: David Vrabel @ 2013-03-26 11:13 UTC (permalink / raw)
  To: Wei Liu
  Cc: Wei Liu, Ian Campbell, konrad.wilk@oracle.com,
	netdev@vger.kernel.org, xen-devel@lists.xen.org,
	annie.li@oracle.com
In-Reply-To: <20130325190911.GC7004@zion.uk.xensource.com>

On 25/03/13 19:09, Wei Liu wrote:
> On Mon, Mar 25, 2013 at 06:29:29PM +0000, David Vrabel wrote:
>>>>
>>>
>>> Are you suggesting move the default macro value to header file? It is
>>> just an estimation, I have no knowledge of the accurate maximum value,
>>> so I think make it part of the protocol a bad idea.
>>
>> How is the author of a new frontend supposed to know how many slots they
>> can use per packet if it is not precisely defined?
>>
> 
> A new frontend shuold use the scheme you mentioned below to get the
> maximum value. For old frontends that cannot be fixed, administrator can
> configure max_skb_slots to accommodate their need.

I'm happy to the threshold for fatal errors to be configurable via a
module parameter.

>>> Do you have a handle on the maximum value?
>>
>> Backends should provide the value to the frontend via a xenstore key
>> (e.g., max-slots-per-frame).  This value should be at least 18 (the
>> historical value of MAX_SKB_FRAGS).
>>
>> The frontend may use up to this specified value or 17 if the
>> max-slots-per-frame key is missing.
>>
>> Supporting at least 18 in the backend is required for existing
>> frontends.  Limiting frontends to 17 allows them to work with all
>> backends (including recent Linux version that only supported 17).
>>
>> It's not clear why 19 or 20 were suggested as possible values.  I
>> checked back to 2.6.18 and MAX_SKB_FRAGS there is (65536/PAGE_SIZE + 2)
> 
> Because the check is >= MAX_SKB_FRAGS originally and James Harper told
> me that "Windows stops counting on 20".
> 
>> == 18.
>>
>> Separately, it may be sensible for the backend to drop packets with more
>> frags than max-slots-per-frame up to some threshold where anything more
>> is considered malicious (i.e., 1 - 18 slots is a valid packet, 19-20 are
>> dropped and 21 or more is a fatal error).
>>
> 
> Why drop the packet when we are able to process it? Frontend cannot know
> it has crossed the line anyway.

Because it's a change to the protocol and we do not want to do this for
a regression fix.

As a separate fix we can consider increasing the number of slots
per-packet once there is a mechanism to report this to the front end.

David

^ permalink raw reply

* Re: [PATCH net-next] netlink: remove duplicated NLMSG_ALIGN
From: Thomas Graf @ 2013-03-26 11:13 UTC (permalink / raw)
  To: Hong Zhiguo; +Cc: netdev, davem, stephen, zhiguo.hong
In-Reply-To: <1364274245-20689-1-git-send-email-honkiko@gmail.com>

On 03/26/13 at 01:04pm, Hong Zhiguo wrote:
> NLMSG_HDRLEN is already aligned value. It's for directly reference
> without extra alignment.
> 
> The redundant alignment here may confuse the API users.
> 
> Signed-off-by: Hong Zhiguo <honkiko@gmail.com>

Acked-by: Thomas Graf <tgraf@suug.ch>

This is actually an obsoleted API that we only want to keep around
for backwards compatibility with user space. It would be great to
replace all in kernel usages of NLMSG_LENGTH() with the type safe
variants nlmsg_*() in <net/netlink.h>

^ permalink raw reply

* RE: [Xen-devel] [PATCH 5/6] xen-netback: coalesce slots before copying
From: James Harper @ 2013-03-26 11:15 UTC (permalink / raw)
  To: David Vrabel
  Cc: Wei Liu, Ian Campbell, konrad.wilk@oracle.com,
	netdev@vger.kernel.org, Wei Liu, xen-devel@lists.xen.org,
	annie.li@oracle.com
In-Reply-To: <5151812F.3080905@citrix.com>

> 
> On 26/03/13 10:52, James Harper wrote:
> >>> It's not clear why 19 or 20 were suggested as possible values.
> >>> I checked back to 2.6.18 and MAX_SKB_FRAGS there is
> >>> (65536/PAGE_SIZE + 2)
> >>
> >> Because the check is >= MAX_SKB_FRAGS originally and James Harper
> >> told me that "Windows stops counting on 20".
> >>
> >
> > I've obviously not been clear enough here... GPLPV stopped counting
> > at 20 (only needed to know if <20 or not). Windows itself can submit
> > a packet to NDIS with hundreds of buffers. It doesn't really matter
> > if it's 21 or 1021, I just didn't want to be misquoted.
> 
> This still isn't clear.  What's the maximum number of ring entries that
> GPLPV driver will use per packet?  Are you saying it's 20?  If so how
> has the GPLPV driver ever worked well with Linux's netback (with its
> historical limit of 18)?
> 

GPLPV will limit to 19, which I thought was the historic Linux limit but obviously not. I'd better look in to that.

I added a debug statement to catch what Windows would give to GPLPV, and it seemed that the maximum was 20, but then I double checked and GPLPV only needs to know if there are >19 frags or not, so it stops counting at 20. The actual number Windows will use internally is not limited so coalescing is required, and no sane amount of bumping up the Linux limit will reduce the requirement that a Windows driver will need to coalesce.

James

^ permalink raw reply

* RE: [Xen-devel] [PATCH 5/6] xen-netback: coalesce slots before copying
From: Paul Durrant @ 2013-03-26 11:24 UTC (permalink / raw)
  To: James Harper, Wei Liu, David Vrabel
  Cc: Ian Campbell, Wei Liu, netdev@vger.kernel.org,
	konrad.wilk@oracle.com, xen-devel@lists.xen.org,
	annie.li@oracle.com
In-Reply-To: <6035A0D088A63A46850C3988ED045A4B3880B4F4@BITCOM1.int.sbss.com.au>

> -----Original Message-----
> From: James Harper [mailto:james.harper@bendigoit.com.au]
> Sent: 26 March 2013 11:01
> To: Paul Durrant; Wei Liu; David Vrabel
> Cc: Ian Campbell; Wei Liu; netdev@vger.kernel.org;
> konrad.wilk@oracle.com; xen-devel@lists.xen.org; annie.li@oracle.com
> Subject: RE: [Xen-devel] [PATCH 5/6] xen-netback: coalesce slots before
> copying
> 
> > > Because the check is >= MAX_SKB_FRAGS originally and James Harper
> told
> > > me that "Windows stops counting on 20".
> > >
> >
> > For the Citrix PV drivers I lifted the #define of MAX_SKB_FRAGS from the
> > dom0 kernel (i.e. 18). If a packet coming from the stack has more than that
> > number of fragments then it's copied and coalesced. The value advertised
> > for TSO size is chosen such that a maximally sized TSO will always fit in 18
> > fragments after coalescing but (since this is Windows) the drivers don't
> trust
> > the stack to stick to that limit and will drop a packet if it won't fit.
> >
> > It seems reasonable that, since the backend is copying anyway, that it
> should
> > handle any fragment list coming from the frontend that it can. This would
> > allow the copy-and-coalesce code to be removed from the frontend (and
> the
> > double-copy avoided). If there is a maximum backend packet size though
> > then I think this needs to be advertised to the frontend. The backend
> should
> > clearly bin packets coming from the frontend that exceed that limit but
> > advertising that limit in xenstore allows the frontend to choose the right
> TSO
> > maximum size to advertise to its stack, rather than having to make it based
> > on some historical value that actually has little meaning (in the absence of
> > grant mapping).
> >
> 
> As stated previously, I've observed windows issuing staggering numbers of
> buffers to NDIS miniport drivers, so you will need to coalesce in a windows
> driver anyway. I'm not sure what the break even point is but I think it's safe
> to say that in the choice between using 1000 (worst case) ring slots (with the
> resulting mapping overheads) and coalescing in the frontend, coalescing is
> going to be the better option.
> 

Oh quite, if the backend is mapping and not copying then coalescing in the frontend is the right way to go. I guess coalescing once the frag count reaches a full ring count is probably necessary (since we can't push a partial packet) but it would be nice not to have to do it if the backend is going to copy anyway.

  Paul

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox