Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next RFC 1/3] virtio: support for urgent descriptors
From: Rusty Russell @ 2014-10-15  5:40 UTC (permalink / raw)
  To: Jason Wang, mst, virtualization, netdev, linux-kernel; +Cc: linux-api, kvm
In-Reply-To: <1413011806-3813-2-git-send-email-jasowang@redhat.com>

Jason Wang <jasowang@redhat.com> writes:
> Below should be useful for some experiments Jason is doing.
> I thought I'd send it out for early review/feedback.
>
> event idx feature allows us to defer interrupts until
> a specific # of descriptors were used.
> Sometimes it might be useful to get an interrupt after
> a specific descriptor, regardless.
> This adds a descriptor flag for this, and an API
> to create an urgent output descriptor.
> This is still an RFC:
> we'll need a feature bit for drivers to detect this,
> but we've run out of feature bits for virtio 0.X.
> For experimentation purposes, drivers can assume
> this is set, or add a driver-specific feature bit.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

The new VRING_DESC_F_URGENT bit is theoretically nicer, but for
networking (which tends to take packets in order) couldn't we just set
the event counter to give us a tx interrupt at the packet we want?

Cheers,
Rusty.

^ permalink raw reply

* RE: [PATCH] net: use hardware buffer pool to allocate skb
From: Jiafei.Pan @ 2014-10-15  5:34 UTC (permalink / raw)
  To: David Miller
  Cc: jkosina@suse.cz, netdev@vger.kernel.org, LeoLi@freescale.com,
	linux-doc@vger.kernel.org, Jiafei.Pan@freescale.com
In-Reply-To: <20141015.002514.384962932982389732.davem@davemloft.net>



> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Wednesday, October 15, 2014 12:25 PM
> To: Pan Jiafei-B37022
> Cc: jkosina@suse.cz; netdev@vger.kernel.org; Li Yang-Leo-R58472; linux-
> doc@vger.kernel.org
> Subject: Re: [PATCH] net: use hardware buffer pool to allocate skb
> 
> From: Pan Jiafei <Jiafei.Pan@freescale.com>
> Date: Wed, 15 Oct 2014 11:26:11 +0800
> 
> > In some platform, there are some hardware block provided
> > to manage buffers to improve performance. So in some case,
> > it is expected that the packets received by some generic
> > NIC should be put into such hardware managed buffers
> > directly, so that such buffer can be released by hardware
> > or by driver.
> >
> > This patch provide such general APIs for generic NIC to
> > use hardware block managed buffers without any modification
> > for generic NIC drivers.
> 
> Why are existing interfaces insufficent for your needs?
> 
> Several ethernet drivers already build SKBs from block
> buffer pools.
> 
Yes, for this matter, in order to do this we should modify the Ethernet
drivers. For example, driver A want to driver B, C, D.. to support driver
A's Hardware block access functions, so we have to modify driver B, C, D...
It will be so complex for this matter.

But by using my patch, I just modify a Ethernet device (I don't care
Which driver it is used) flag in driver A in order to implement this
Ethernet device using hardware block access functions provided by
Driver A.

> They allocate pools of pages which the hardware divides into various
> powers of 2, then the RX descriptor says what pieces of which pools
> were used to hold the data for a packet, and then the SKB is
> constructed with page frags pointing to those locations.
> 
> It's very cheap, inexpensive, and existing APIs are considered to
> cover all use cases.

^ permalink raw reply

* [PATCH (net.git) 1/1] stmmac: fix sti compatibililies
From: Giuseppe Cavallaro @ 2014-10-15  5:30 UTC (permalink / raw)
  To: netdev; +Cc: davem, sfr, linux-kernel, linux-next, Giuseppe Cavallaro

this patch is to fix the stmmac data compatibilities for
all the SoCs inside the platform file.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac.h       |    3 ++-
 .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |    8 ++++----
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index 4452889..c3c4065 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -144,7 +144,8 @@ extern const struct stmmac_of_data meson6_dwmac_data;
 extern const struct stmmac_of_data sun7i_gmac_data;
 #endif
 #ifdef CONFIG_DWMAC_STI
-extern const struct stmmac_of_data sti_gmac_data;
+extern const struct stmmac_of_data stih4xx_dwmac_data;
+extern const struct stmmac_of_data stid127_dwmac_data;
 #endif
 #ifdef CONFIG_DWMAC_SOCFPGA
 extern const struct stmmac_of_data socfpga_gmac_data;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 0d6b9ad..db56fa7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -37,10 +37,10 @@ static const struct of_device_id stmmac_dt_ids[] = {
 	{ .compatible = "allwinner,sun7i-a20-gmac", .data = &sun7i_gmac_data},
 #endif
 #ifdef CONFIG_DWMAC_STI
-	{ .compatible = "st,stih415-dwmac", .data = &sti_gmac_data},
-	{ .compatible = "st,stih416-dwmac", .data = &sti_gmac_data},
-	{ .compatible = "st,stid127-dwmac", .data = &sti_gmac_data},
-	{ .compatible = "st,stih407-dwmac", .data = &sti_gmac_data},
+	{ .compatible = "st,stih415-dwmac", .data = &stih4xx_dwmac_data},
+	{ .compatible = "st,stih416-dwmac", .data = &stih4xx_dwmac_data},
+	{ .compatible = "st,stid127-dwmac", .data = &stid127_dwmac_data},
+	{ .compatible = "st,stih407-dwmac", .data = &stih4xx_dwmac_data},
 #endif
 #ifdef CONFIG_DWMAC_SOCFPGA
 	{ .compatible = "altr,socfpga-stmmac", .data = &socfpga_gmac_data },
-- 
1.7.4.4

^ permalink raw reply related

* Re: linux-next: build failure after merge of the net tree
From: Giuseppe CAVALLARO @ 2014-10-15  5:24 UTC (permalink / raw)
  To: David Miller, sfr; +Cc: netdev, linux-next, linux-kernel
In-Reply-To: <20141014.220254.144210672646232483.davem@davemloft.net>

On 10/15/2014 4:02 AM, David Miller wrote:
> From: Stephen Rothwell <sfr@canb.auug.org.au>
> Date: Wed, 15 Oct 2014 10:44:11 +1100
>
>> Hi all,
>>
>> After merging the net tree, today's linux-next build (arm
>> multi_v7_defconfig) failed like this:
>>
>> drivers/built-in.o: In function `.LANCHOR0':
>> :(.rodata+0x6b764): undefined reference to `sti_gmac_data'
>> :(.rodata+0x6b828): undefined reference to `sti_gmac_data'
>> :(.rodata+0x6b8ec): undefined reference to `sti_gmac_data'
>> :(.rodata+0x6b9b0): undefined reference to `sti_gmac_data'
>>
>> Caused by commit 53b26b9bc9a5 ("stmmac: dwmac-sti: review the
>> glue-logic for STi4xx and STiD127 SoCs") which renamed sti_gmac_data to
>> stih4xx_dwmac_data (or something) without updating all the references
>> to it (including the one added in the previous commit ...).
>>
>> I reverted that commit for today.
>
> Sigh, Giuseppe if I don't see a proper fix by tomorrow I'm
> reverting all of the stmmac changes I applied today.

I am looking at this now and let you know soon

peppe

^ permalink raw reply

* Re: [PATCH] net: use hardware buffer pool to allocate skb
From: Oliver Hartkopp @ 2014-10-15  4:59 UTC (permalink / raw)
  To: Pan Jiafei, davem, jkosina; +Cc: netdev, LeoLi, linux-doc
In-Reply-To: <1413343571-33231-1-git-send-email-Jiafei.Pan@freescale.com>

On 15.10.2014 05:26, Pan Jiafei wrote:
> In some platform, there are some hardware block provided
> to manage buffers to improve performance.

(..)

> diff --git a/net/Kconfig b/net/Kconfig
> index d6b138e..346e021 100644
> --- a/net/Kconfig
> +++ b/net/Kconfig
> @@ -291,6 +291,16 @@ config NET_FLOW_LIMIT
>   	  with many clients some protection against DoS by a single (spoofed)
>   	  flow that greatly exceeds average workload.
>
> +config USE_HW_SKB
> +	bool "NIC use hardware managed buffer to build skb"
> +	depends on INET

The feature seems to be valid for network devices in general.
Why did you make it depending on INET ??

Regards,
Oliver

> +	---help---
> +	  If select this, the third party drivers will use hardware managed
> +	  buffers to allocate SKB without any modification for the driver.
> +
> +	  Documentation on how to use this featue can be found at
> +	  <file:Documentation/networking/hw_skb.txt>.
> +
>   menu "Network testing"
>
>   config NET_PKTGEN


^ permalink raw reply

* Re: [PATCH v2] ipv4: dst_entry leak in ip_append_data()
From: Eric Dumazet @ 2014-10-15  4:46 UTC (permalink / raw)
  To: Vasily Averin
  Cc: netdev, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy
In-Reply-To: <543CAD2A.3070701@parallels.com>

On Tue, 2014-10-14 at 08:57 +0400, Vasily Averin wrote:
> v2: adjust the indentation of the arguments __ip_append_data() call
> 
> Fixes: 2e77d89b2fa8 ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")
> 
> If sk_write_queue is empty ip_append_data() executes ip_setup_cork()
> that "steals" dst entry from rt to cork. Later it calls __ip_append_data()
> that creates skb and adds it to sk_write_queue.
> 
> If skb was added successfully following ip_push_pending_frames() call
> reassign dst entries from cork to skb, and kfree_skb frees dst_entry.
> 
> However nobody frees stolen dst_entry if skb was not added into sk_write_queue.

I thought this was done by ip_flush_pending_frames() ?

Can you describe the issue more precisely ?

Thanks !

^ permalink raw reply

* Re: [PATCH RFC] virtio_net: enable tx interrupt
From: Jason Wang @ 2014-10-15  4:40 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel; +Cc: netdev, virtualization
In-Reply-To: <543DF92C.6060209@redhat.com>

On 10/15/2014 12:33 PM, Jason Wang wrote:
> On 10/15/2014 07:11 AM, Michael S. Tsirkin wrote:
>> > On Wed, Oct 15, 2014 at 12:53:59AM +0300, Michael S. Tsirkin wrote:
>>>> >> >  static void skb_xmit_done(struct virtqueue *vq)
>>>> >> >  {
>>>> >> >  	struct virtnet_info *vi = vq->vdev->priv;
>>>> >> > +	struct send_queue *sq = &vi->sq[vq2txq(vq)];
>>>> >> >  
>>>> >> > -	/* Suppress further interrupts. */
>>>> >> > -	virtqueue_disable_cb(vq);
>>>> >> > -
>> > One note here: current code seems racy because of doing
>> > virtqueue_disable_cb from skb_xmit_done that I'm dropping here: there's
>> > no guarantee we don't get an interrupt while tx ring is running, and if
>> > that happens we can end up with interrupts disabled forever.
>> >
> Looks harmless since:
>
> - if event index is enabled, virtqueue_disable_cb() does nothing in fact.
> - if event index is disabled, we don't depend on tx interrupt and when
> num_free is low we will try to enable the tx interrupt again.

Ok, I think I get you here. For 'current' you mean the rfc I post.

^ permalink raw reply

* Re: [PATCH RFC] virtio_net: enable tx interrupt
From: Jason Wang @ 2014-10-15  4:33 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel; +Cc: netdev, virtualization
In-Reply-To: <20141014231154.GA23806@redhat.com>

On 10/15/2014 07:11 AM, Michael S. Tsirkin wrote:
> On Wed, Oct 15, 2014 at 12:53:59AM +0300, Michael S. Tsirkin wrote:
>> >  static void skb_xmit_done(struct virtqueue *vq)
>> >  {
>> >  	struct virtnet_info *vi = vq->vdev->priv;
>> > +	struct send_queue *sq = &vi->sq[vq2txq(vq)];
>> >  
>> > -	/* Suppress further interrupts. */
>> > -	virtqueue_disable_cb(vq);
>> > -
> One note here: current code seems racy because of doing
> virtqueue_disable_cb from skb_xmit_done that I'm dropping here: there's
> no guarantee we don't get an interrupt while tx ring is running, and if
> that happens we can end up with interrupts disabled forever.
>

Looks harmless since:

- if event index is enabled, virtqueue_disable_cb() does nothing in fact.
- if event index is disabled, we don't depend on tx interrupt and when
num_free is low we will try to enable the tx interrupt again.

^ permalink raw reply

* Re: [PATCH net v2 0/4] ipv6 and related cleanup for cxgb4/cxgb4i
From: David Miller @ 2014-10-15  4:30 UTC (permalink / raw)
  To: anish
  Cc: netdev, linux-scsi, hch, jbottomley, kxie, manojmalviya,
	hariprasad, leedom, swise
In-Reply-To: <1413342444-16388-1-git-send-email-anish@chelsio.com>

From: Anish Bhatt <anish@chelsio.com>
Date: Tue, 14 Oct 2014 20:07:20 -0700

> This patch set removes some duplicated/extraneous code from cxgb4i, guards
> cxgb4 against compilation failure based on ipv6 tristate, make ipv6 related
> code no longer be enabled by default irrespective of ipv6 tristate and fixes
> a refcnt issue.
> -Anish
> 
> v2 : Provide more detailed commit messages, make subject more concise as 
> recommended by Dave Miller.

Much better, series applied, thanks!

^ permalink raw reply

* Re: [PATCH] net: use hardware buffer pool to allocate skb
From: David Miller @ 2014-10-15  4:26 UTC (permalink / raw)
  To: eric.dumazet; +Cc: Jiafei.Pan, jkosina, netdev, LeoLi, linux-doc
In-Reply-To: <1413346503.17109.23.camel@edumazet-glaptop2.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 14 Oct 2014 21:15:03 -0700

> Take a look at how drivers can 'lock' pages already, and build skb
> sith page frags. It is already there.

+1

^ permalink raw reply

* Re: [PATCH] net: use hardware buffer pool to allocate skb
From: David Miller @ 2014-10-15  4:25 UTC (permalink / raw)
  To: Jiafei.Pan; +Cc: jkosina, netdev, LeoLi, linux-doc
In-Reply-To: <1413343571-33231-1-git-send-email-Jiafei.Pan@freescale.com>

From: Pan Jiafei <Jiafei.Pan@freescale.com>
Date: Wed, 15 Oct 2014 11:26:11 +0800

> In some platform, there are some hardware block provided
> to manage buffers to improve performance. So in some case,
> it is expected that the packets received by some generic
> NIC should be put into such hardware managed buffers
> directly, so that such buffer can be released by hardware
> or by driver.
> 
> This patch provide such general APIs for generic NIC to
> use hardware block managed buffers without any modification
> for generic NIC drivers.

Why are existing interfaces insufficent for your needs?

Several ethernet drivers already build SKBs from block
buffer pools.

They allocate pools of pages which the hardware divides into various
powers of 2, then the RX descriptor says what pieces of which pools
were used to hold the data for a packet, and then the SKB is
constructed with page frags pointing to those locations.

It's very cheap, inexpensive, and existing APIs are considered to
cover all use cases.

^ permalink raw reply

* Re: [PATCH net 0/3] SCTP fixes
From: David Miller @ 2014-10-15  4:21 UTC (permalink / raw)
  To: nhorman; +Cc: dborkman, linux-sctp, netdev
In-Reply-To: <20141015030631.GA6778@localhost.localdomain>

From: Neil Horman <nhorman@tuxdriver.com>
Date: Tue, 14 Oct 2014 23:06:32 -0400

> On Tue, Oct 14, 2014 at 12:46:57PM -0400, David Miller wrote:
>> From: Daniel Borkmann <dborkman@redhat.com>
>> Date: Thu,  9 Oct 2014 22:55:30 +0200
>> 
>> > Here are some SCTP fixes.
>> > 
>> > [ Note, immediate workaround would be to disable ASCONF (it
>> >   is sysctl disabled by default). It is actually only used
>> >   together with chunk authentication. ]
>> 
>> Series applied, thanks guys.
> Why did you apply this Dave? There was ongoing discussion around it.

Sorry Neil, that wasn't my impression.

Worry not I can revert or apply relative fixes on top.
:-)

Thanks.

^ permalink raw reply

* Re: [PATCH] net: use hardware buffer pool to allocate skb
From: Eric Dumazet @ 2014-10-15  4:15 UTC (permalink / raw)
  To: Pan Jiafei; +Cc: davem, jkosina, netdev, LeoLi, linux-doc
In-Reply-To: <1413343571-33231-1-git-send-email-Jiafei.Pan@freescale.com>

On Wed, 2014-10-15 at 11:26 +0800, Pan Jiafei wrote:
> In some platform, there are some hardware block provided
> to manage buffers to improve performance. So in some case,
> it is expected that the packets received by some generic
> NIC should be put into such hardware managed buffers
> directly, so that such buffer can be released by hardware
> or by driver.

You repeat 'some' four times.

> 
> This patch provide such general APIs for generic NIC to
> use hardware block managed buffers without any modification
> for generic NIC drivers.

...

> In this patch, the following fields are added to "net_device":
>     void *hw_skb_priv;
>     struct sk_buff *(*alloc_hw_skb)(void *hw_skb_priv, unsigned int length);
>     void (*free_hw_skb)(struct sk_buff *skb);
> so in order to let generic NIC driver to use hardware managed
> buffers, the function "alloc_hw_skb" and "free_hw_skb"
> provide implementation for allocate and free hardware managed
> buffers. "hw_skb_priv" is provided to pass some private data for
> these two functions.
> 
> When the socket buffer is allocated by these APIs, "hw_skb_state"
> is provided in struct "sk_buff". this argument can indicate
> that the buffer is hardware managed buffer, this buffer
> should freed by software or by hardware.
> 
> Documentation on how to use this featue can be found at
> <file:Documentation/networking/hw_skb.txt>.
> 
> Signed-off-by: Pan Jiafei <Jiafei.Pan@freescale.com>


I am giving a strong NACK, of course.

We are not going to grow sk_buff and add yet another conditional in fast
path for a very obscure feature like that.

Memory management is not going to be done by drivers.

The way it should work is that if your hardware has specific needs, rx
and tx paths (of the driver) need to make the needed adaptation.
Not the other way.

We already have complex skb layouts, we do not need a new one. 

Take a look at how drivers can 'lock' pages already, and build skb sith
page frags. It is already there.



^ permalink raw reply

* Re: [PATCH net-next RFC 3/3] virtio-net: conditionally enable tx interrupt
From: Jason Wang @ 2014-10-15  3:34 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, netdev, linux-kernel, virtualization, linux-api
In-Reply-To: <20141014215146.GB23344@redhat.com>

On 10/15/2014 05:51 AM, Michael S. Tsirkin wrote:
> On Sat, Oct 11, 2014 at 03:16:46PM +0800, Jason Wang wrote:
>> > We free transmitted packets in ndo_start_xmit() in the past to get better
>> > performance in the past. One side effect is that skb_orphan() needs to be
>> > called in ndo_start_xmit() which makes sk_wmem_alloc not accurate in
>> > fact. For TCP protocol, this means several optimization could not work well
>> > such as TCP small queue and auto corking. This can lead extra low
>> > throughput of small packets stream.
>> > 
>> > Thanks to the urgent descriptor support. This patch tries to solve this
>> > issue by enable the tx interrupt selectively for stream packets. This means
>> > we don't need to orphan TCP stream packets in ndo_start_xmit() but enable
>> > tx interrupt for those packets. After we get tx interrupt, a tx napi was
>> > scheduled to free those packets.
>> > 
>> > With this method, sk_wmem_alloc of TCP socket were more accurate than in
>> > the past which let TCP can batch more through TSQ and auto corking.
>> > 
>> > Signed-off-by: Jason Wang <jasowang@redhat.com>
>> > ---
>> >  drivers/net/virtio_net.c | 164 ++++++++++++++++++++++++++++++++++++-----------
>> >  1 file changed, 128 insertions(+), 36 deletions(-)
>> > 
>> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> > index 5810841..b450fc4 100644
>> > --- a/drivers/net/virtio_net.c
>> > +++ b/drivers/net/virtio_net.c
>> > @@ -72,6 +72,8 @@ struct send_queue {
>> >  
>> >  	/* Name of the send queue: output.$index */
>> >  	char name[40];
>> > +
>> > +	struct napi_struct napi;
>> >  };
>> >  
>> >  /* Internal representation of a receive virtqueue */
>> > @@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
>> >  	return p;
>> >  }
>> >  
>> > +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
>> > +{
>> > +	struct sk_buff *skb;
>> > +	unsigned int len;
>> > +	struct virtnet_info *vi = sq->vq->vdev->priv;
>> > +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>> > +	int sent = 0;
>> > +
>> > +	while (sent < budget &&
>> > +	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
>> > +		pr_debug("Sent skb %p\n", skb);
>> > +
>> > +		u64_stats_update_begin(&stats->tx_syncp);
>> > +		stats->tx_bytes += skb->len;
>> > +		stats->tx_packets++;
>> > +		u64_stats_update_end(&stats->tx_syncp);
>> > +
>> > +		dev_kfree_skb_any(skb);
>> > +		sent++;
>> > +	}
>> > +
>> > +	return sent;
>> > +}
>> > +
>> >  static void skb_xmit_done(struct virtqueue *vq)
>> >  {
>> >  	struct virtnet_info *vi = vq->vdev->priv;
>> > +	struct send_queue *sq = &vi->sq[vq2txq(vq)];
>> >  
>> > -	/* Suppress further interrupts. */
>> > -	virtqueue_disable_cb(vq);
>> > -
>> > -	/* We were probably waiting for more output buffers. */
>> > -	netif_wake_subqueue(vi->dev, vq2txq(vq));
>> > +	if (napi_schedule_prep(&sq->napi)) {
>> > +		virtqueue_disable_cb(vq);
>> > +		virtqueue_disable_cb_urgent(vq);
> This disable_cb is no longer safe in xmit_done callback,
> since queue can be running at the same time.
>
> You must do it under tx lock. And yes, this likely will not work
> work well without event_idx. We'll probably need extra
> synchronization for such old hosts.
>
>
>

Yes, and the virtqueue_enable_cb_prepare() in virtnet_poll_tx() needs to
be synced with virtqueue_enabled_cb_dealyed(). Otherwise an old idx will
be published and we may still see tx interrupt storm.

^ permalink raw reply

* Re: Unable to handle kernel NULL pointer dereference at 0000000000000088 RIP:  [<ffffffff881ba167>] :bnx2:bnx2_poll_work+0xc7/0x1253
From: Joe Jin @ 2014-10-15  3:29 UTC (permalink / raw)
  To: zheng.li, Sony Chacko, Dept-HSGLinuxNICDev
  Cc: Michael Chan, netdev@vger.kernel.org
In-Reply-To: <5438F6C6.4040309@oracle.com>

Copy to new maintainer from QLogic and netdev.

Thanks,
Joe
On 10/11/14 17:22, zheng.li wrote:
> Hi Michael,
> I encounter a null pointer in bnx2_poll_work,
> after analyzed the vmcore found:
> tx_prod = 27708 and hw_cons = bnx2_get_hw_tx_cons(bnapi) = 27722;
> hw_cons > tx_prod;
> so the root cause is mostly HW sent data count is larger than stack
> provide in bnx2_start_xmit to cause memory override, normally HW just
> can sent data maximum is tx_prod, but don't know why HW sent data more
> than tx_prod 14 data.
> 
> Can you help to look at the issue? we encounter several times.
> bnx2 driver is 2.1.11,
> #define DRV_MODULE_VERSION	"2.1.11"
> #define DRV_MODULE_RELDATE	"July 20, 2011"
> Kernel version is : 2.6.18-371.1.2.0.1
> 
> 
> vmcore show infor is below:
> 
> crash64> bnx2 ffff81122c650500
> struct bnx2 {
>   regview = 0xffffc200100e0000,
>   dev = 0xffff81122c650000,
>   pdev = 0xffff81122f0c9000,
>   intr_sem = {
>     counter = 0
>   },
>   flags = 22404,
>   bnx2_napi = {{
>       dummy_netdev = 0xffff81242ae3e800,
>       bp = 0xffff81122c650500,
>       status_blk = {
>         msi = 0xffff81122391b000,
>         msix = 0xffff81122391b000
>       },
>       hw_tx_cons_ptr = 0xffff81122391b00a,
>       hw_rx_cons_ptr = 0xffff81122391b012,
>       last_status_idx = 65048,
>       int_num = 0,
>       cnic_tag = 0,
>       cnic_present = 0,
>       rx_ring = {
>         rx_prod_bseq = 1540471240,
>         rx_prod = 21188,
>         rx_cons = 20932,
>         rx_bidx_addr = 65540,
>         rx_bseq_addr = 65544,
>         rx_pg_bidx_addr = 65604,
>         rx_pg_prod = 0,
>         rx_pg_cons = 0,
>         rx_buf_ring = 0xffffc20014a7b000,
>         rx_desc_ring = {0xffff81122b8a0000, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0},
>         rx_pg_ring = 0x0,
>         rx_pg_desc_ring = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
>         rx_desc_mapping = {78039875584, 0, 0, 0, 0, 0, 0, 0},
>         rx_pg_desc_mapping = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
>       },
>       tx_ring = {
>         tx_prod_bseq = 3702584042,
>         tx_prod = 27708,
>         tx_bidx_addr = 69768,
>         tx_bseq_addr = 69776,
>         tx_desc_ring = 0xffff811227890000,
>         tx_buf_ring = 0xffff81242f510000,
>         tx_cons = 27603,
>         hw_tx_cons = 27603,
>         tx_desc_mapping = 77972701184
>       }
>     },
> 
> crash64> rd 0xffff81122391b00a
> ffff81122391b00a:  0000000000006c4a
> hw_cons = 6c4a = 27722;
> 
> usr/src/debug/kernel-2.6.18/linux-2.6.18-371.1.2.0.1.el5.x86_64/include/linux/skbuff.h:
> 921
> 0xffffffff881ba167 <bnx2_poll_work+199>:921>:   mov    0x88(%r13),%edx
> R13: 0000000000000000
> R13 is skb which is NULL at that moment.
> 
> Had refer https://access.redhat.com/solutions/341183
> and
> http://kernel.opensuse.org/cgit/kernel/commit/?id=c1f5163de417dab01fa9daaf09a74bbb19303f3c
> but can't exactly know which case our bug hit.
> 
> Thanks,
> James Li
> 

^ permalink raw reply

* [PATCH] net: use hardware buffer pool to allocate skb
From: Pan Jiafei @ 2014-10-15  3:26 UTC (permalink / raw)
  To: davem, jkosina; +Cc: netdev, Jiafei.Pan, LeoLi, linux-doc

In some platform, there are some hardware block provided
to manage buffers to improve performance. So in some case,
it is expected that the packets received by some generic
NIC should be put into such hardware managed buffers
directly, so that such buffer can be released by hardware
or by driver.

This patch provide such general APIs for generic NIC to
use hardware block managed buffers without any modification
for generic NIC drivers.
In this patch, the following fields are added to "net_device":
    void *hw_skb_priv;
    struct sk_buff *(*alloc_hw_skb)(void *hw_skb_priv, unsigned int length);
    void (*free_hw_skb)(struct sk_buff *skb);
so in order to let generic NIC driver to use hardware managed
buffers, the function "alloc_hw_skb" and "free_hw_skb"
provide implementation for allocate and free hardware managed
buffers. "hw_skb_priv" is provided to pass some private data for
these two functions.

When the socket buffer is allocated by these APIs, "hw_skb_state"
is provided in struct "sk_buff". this argument can indicate
that the buffer is hardware managed buffer, this buffer
should freed by software or by hardware.

Documentation on how to use this featue can be found at
<file:Documentation/networking/hw_skb.txt>.

Signed-off-by: Pan Jiafei <Jiafei.Pan@freescale.com>
---
 Documentation/networking/hw_skb.txt | 117 ++++++++++++++++++++++++++++++++++++
 include/linux/netdevice.h           |   5 ++
 include/linux/skbuff.h              |  16 +++++
 net/Kconfig                         |  10 +++
 net/core/skbuff.c                   |  28 +++++++++
 5 files changed, 176 insertions(+)
 create mode 100644 Documentation/networking/hw_skb.txt

diff --git a/Documentation/networking/hw_skb.txt b/Documentation/networking/hw_skb.txt
new file mode 100644
index 0000000..256f3fc
--- /dev/null
+++ b/Documentation/networking/hw_skb.txt
@@ -0,0 +1,117 @@
+Document for using hardware managed SKB.
+
+1. Description
+
+In some platform, there are some hardware block provided
+to manage buffers to improve performance. So in some case,
+it is expected that the packets received by some generic
+NIC should be put into such hardware managed buffers
+directly, so that such buffer can be released by hardware
+or by driver.
+
+2. Related Struct Definition
+
+Some general APIs are provided for generic NIC to use hardware
+block managed buffers without any modification for generic NIC
+drivers.
+
+1)Kernel Configuration Item
+
+	"CONFIG_USE_HW_SKB"
+
+2)The DEVICE structure
+
+	struct net_device {
+		...
+	#ifdef CONFIG_USE_HW_SKB
+		void *hw_skb_priv;
+		struct sk_buff *(*alloc_hw_skb)(void *hw_skb_priv, unsigned int length);
+		void (*free_hw_skb)(struct sk_buff *skb);
+	#endif
+		...
+	}
+
+"hw_skb_priv" is private data for "alloc_hw_skb" and "free_hw_skb" functions.
+"alloc_hw_skb" is for allocating skb by using hardware managed buffer.
+"free_hw_skb" is for freeing skb allocated by hardware manager buffer.
+
+3)struct sk_buff - socket buffer
+
+	struct sk_buff {
+		...
+	#ifdef CONFIG_SKB_USE_HW_BP
+		__u32 			hw_skb_state;
+		void			*hw_skb_priv;
+		void 			(*free_hw_skb)(struct sk_buff *skb);
+	#endif
+		...
+	}
+
+	/* hw_skb_state list */
+	enum hw_skb_state {
+		/* If set, SKB use hardware managed buffer */
+		IS_HW_SKB = 1 << 0,
+		/* If set, and skb can be freed by software by calling
+		 * netdev->free_hw_skb
+		 */
+		HW_SKB_SW_FREE = 1 << 1,
+	};
+
+"hw_skb_priv" and "free_hw_skb" are the same with the field in the
+struct "net_device"
+
+After calling "alloc_hw_skb" to allocate skb by using hardware managed
+buffers, "hw_skb_priv" and "free_hw_skb" is set in SKB driver:
+	skb->hw_skb_priv = dev->hw_skb_priv;
+	skb->free_hw_skb = dev->free_hw_skb;
+So that when "struct net_device	*dev" is changed after the skb is allocated,
+It is be confirmed that this skb can be freed by the method synced
+with allocation.
+
+"hw_skb_state" indicates that the state of SKB. When the skb is allocated
+by "alloc_hw_skb" function, the flag of "IS_HW_SKB" is set by
+"__netdev_alloc_skb" function in skbuff.c when returned from "alloc_hw_skb".
+But in "alloc_hw_skb", "HW_SKB_SW_FREE" must be set if the skb should be
+freed by calling "free_hw_skb", otherwise, the skb will never be freed by
+any driver until it is freed by hardware block.
+
+SKB using hardware managed buffer is not recycleable.
+
+3. How to use this feature
+
+For example, driver "A" wants the third-party NIC driver "B" to
+store the data in some hardware managed buffer then send to "A".
+
+1) Select "CONFIG_USE_HW_SKB" to enable this feature.
+
+2) In driver "A", implement the function "alloc_hw_skb" and
+"free_hw_skb". For example:
+
+struct sk_buff *alloc_hw_skb(void *priv, unsigned int length)
+{
+	buf = alloc_hw_buffer();
+	skb = build_skb(buf, ...);
+	if (skb)
+		skb->hw_skb_state |= HW_SKB_SW_FREE;
+
+	return skb;
+}
+
+void free_hw_skb(struct sk_buff *skb)
+{
+	free_hw_buffer(skb->head);
+}
+
+3) In driver "A", get "net_device" handle of net device case using
+driver "B".
+	...
+	net_dev_b->hw_skb_priv = priv;
+	net_dev_b->alloc_hw_skb = alloc_hw_skb;
+	net_dev_b->free_hw_skb = free_hw_skb;
+	...
+
+4) Then, when driver "B" wants to allocate skb, "alloc_hw_skb"
+will be called to allocate hardware manager skb firstly, if
+failed, the normal skb will also be allocate, if successed,
+the skb will be freed by calling free_hw_skb when "kfree_skb"
+is called to free this skb.
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 838407a..42b6158 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1689,6 +1689,11 @@ struct net_device {
 	struct lock_class_key *qdisc_tx_busylock;
 	int group;
 	struct pm_qos_request	pm_qos_req;
+#ifdef CONFIG_USE_HW_SKB
+	void *hw_skb_priv;
+	struct sk_buff *(*alloc_hw_skb)(void *hw_skb_priv, unsigned int length);
+	void (*free_hw_skb)(struct sk_buff *skb);
+#endif
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 776104b..d9afdeb 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -436,6 +436,16 @@ static inline u32 skb_mstamp_us_delta(const struct skb_mstamp *t1,
 }
 
 
+/* hw_skb_state list */
+enum hw_skb_state {
+	/* If set, SKB use hardware managed buffer */
+	IS_HW_SKB = 1 << 0,
+	/* If set, and skb can be freed by software by calling
+	 * netdev->free_hw_skb
+	 */
+	HW_SKB_SW_FREE = 1 << 1,
+};
+
 /** 
  *	struct sk_buff - socket buffer
  *	@next: Next buffer in list
@@ -646,6 +656,12 @@ struct sk_buff {
 	__u16			network_header;
 	__u16			mac_header;
 
+#ifdef CONFIG_USE_HW_SKB
+	__u32			hw_skb_state;
+	void			*hw_skb_priv;
+	void			(*free_hw_skb)(struct sk_buff *skb);
+#endif
+
 	__u32			headers_end[0];
 
 	/* These elements must be at the end, see alloc_skb() for details.  */
diff --git a/net/Kconfig b/net/Kconfig
index d6b138e..346e021 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -291,6 +291,16 @@ config NET_FLOW_LIMIT
 	  with many clients some protection against DoS by a single (spoofed)
 	  flow that greatly exceeds average workload.
 
+config USE_HW_SKB
+	bool "NIC use hardware managed buffer to build skb"
+	depends on INET
+	---help---
+	  If select this, the third party drivers will use hardware managed
+	  buffers to allocate SKB without any modification for the driver.
+
+	  Documentation on how to use this featue can be found at
+	  <file:Documentation/networking/hw_skb.txt>.
+
 menu "Network testing"
 
 config NET_PKTGEN
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7b3df0d..f8603e5 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -415,6 +415,19 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
 	unsigned int fragsz = SKB_DATA_ALIGN(length + NET_SKB_PAD) +
 			      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 
+#ifdef CONFIG_USE_HW_SKB
+	if (dev->alloc_hw_skb) {
+		skb = dev->alloc_hw_skb(dev->hw_skb_priv, length);
+		if (likely(skb)) {
+			skb->hw_skb_state |= IS_HW_SKB;
+			skb->hw_skb_priv = dev->hw_skb_priv;
+			skb->free_hw_skb = dev->free_hw_skb;
+			skb_reserve(skb, NET_SKB_PAD);
+			skb->dev = dev;
+			return skb;
+		}
+	}
+#endif
 	if (fragsz <= PAGE_SIZE && !(gfp_mask & (__GFP_WAIT | GFP_DMA))) {
 		void *data;
 
@@ -432,6 +445,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
 		skb = __alloc_skb(length + NET_SKB_PAD, gfp_mask,
 				  SKB_ALLOC_RX, NUMA_NO_NODE);
 	}
+
 	if (likely(skb)) {
 		skb_reserve(skb, NET_SKB_PAD);
 		skb->dev = dev;
@@ -483,6 +497,15 @@ static void skb_clone_fraglist(struct sk_buff *skb)
 
 static void skb_free_head(struct sk_buff *skb)
 {
+#ifdef CONFIG_USE_HW_SKB
+	if (skb->hw_skb_state & IS_HW_SKB) {
+		if (skb->hw_skb_state & HW_SKB_SW_FREE) {
+			BUG_ON(!skb->free_hw_skb);
+			skb->free_hw_skb(skb);
+		}
+		return;
+	}
+#endif
 	if (skb->head_frag)
 		put_page(virt_to_head_page(skb->head));
 	else
@@ -506,6 +529,10 @@ static void skb_release_data(struct sk_buff *skb)
 	 * If skb buf is from userspace, we need to notify the caller
 	 * the lower device DMA has done;
 	 */
+#ifdef CONFIG_USE_HW_SKB
+	if (skb->hw_skb_state & IS_HW_SKB)
+		goto skip_callback;
+#endif
 	if (shinfo->tx_flags & SKBTX_DEV_ZEROCOPY) {
 		struct ubuf_info *uarg;
 
@@ -514,6 +541,7 @@ static void skb_release_data(struct sk_buff *skb)
 			uarg->callback(uarg, true);
 	}
 
+skip_callback:
 	if (shinfo->frag_list)
 		kfree_skb_list(shinfo->frag_list);
 
-- 
2.1.0.27.g96db324


^ permalink raw reply related

* Re: [PATCH net-next RFC 0/3] virtio-net: Conditionally enable tx interrupt
From: Jason Wang @ 2014-10-15  3:24 UTC (permalink / raw)
  To: Michael S. Tsirkin, David Miller
  Cc: kvm, netdev, linux-kernel, virtualization, linux-api
In-Reply-To: <20141014215133.GA23344@redhat.com>

On 10/15/2014 05:51 AM, Michael S. Tsirkin wrote:
> On Tue, Oct 14, 2014 at 02:53:27PM -0400, David Miller wrote:
>> From: Jason Wang <jasowang@redhat.com>
>> Date: Sat, 11 Oct 2014 15:16:43 +0800
>>
>>> We free old transmitted packets in ndo_start_xmit() currently, so any
>>> packet must be orphaned also there. This was used to reduce the overhead of
>>> tx interrupt to achieve better performance. But this may not work for some
>>> protocols such as TCP stream. TCP depends on the value of sk_wmem_alloc to
>>> implement various optimization for small packets stream such as TCP small
>>> queue and auto corking. But orphaning packets early in ndo_start_xmit()
>>> disable such things more or less since sk_wmem_alloc was not accurate. This
>>> lead extra low throughput for TCP stream of small writes.
>>>
>>> This series tries to solve this issue by enable tx interrupts for all TCP
>>> packets other than the ones with push bit or pure ACK. This is done through
>>> the support of urgent descriptor which can force an interrupt for a
>>> specified packet. If tx interrupt was enabled for a packet, there's no need
>>> to orphan it in ndo_start_xmit(), we can free it tx napi which is scheduled
>>> by tx interrupt. Then sk_wmem_alloc was more accurate than before and TCP
>>> can batch more for small write. More larger skb was produced by TCP in this
>>> case to improve both throughput and cpu utilization.
>>>
>>> Test shows great improvements on small write tcp streams. For most of the
>>> other cases, the throughput and cpu utilization are the same in the
>>> past. Only few cases, more cpu utilization was noticed which needs more
>>> investigation.
>>>
>>> Review and comments are welcomed.
>> I think proper accounting and queueing (at all levels, not just TCP
>> sockets) is more important than trying to skim a bunch of cycles by
>> avoiding TX interrupts.
>>
>> Having an event to free the SKB is absolutely essential for the stack
>> to operate correctly.
>>
>> And with virtio-net you don't even have the excuse of "the HW
>> unfortunately doesn't have an appropriate TX event."
>>
>> So please don't play games, and instead use TX interrupts all the
>> time.  You can mitigate them in various ways, but don't turn them on
>> selectively based upon traffic type, that's terrible.
> This got me thinking: how about using virtqueue_enable_cb_delayed
> for this mitigation?

Should work, another possible solution is interrupt coalescing, which
can speed up also the case without event index.
> It's pretty easy to implement - I'll send a proof of concept patch
> separately.

^ permalink raw reply

* [PATCH net v2 0/4] ipv6 and related cleanup for cxgb4/cxgb4i
From: Anish Bhatt @ 2014-10-15  3:07 UTC (permalink / raw)
  To: netdev, linux-scsi
  Cc: davem, hch, jbottomley, kxie, manojmalviya, hariprasad, leedom,
	swise, Anish Bhatt

This patch set removes some duplicated/extraneous code from cxgb4i, guards
cxgb4 against compilation failure based on ipv6 tristate, make ipv6 related
code no longer be enabled by default irrespective of ipv6 tristate and fixes
a refcnt issue.
-Anish

v2 : Provide more detailed commit messages, make subject more concise as 
recommended by Dave Miller.

Anish Bhatt (4):
  cxgb4i : Remove duplicated CLIP handling code
  cxgb4 : Fix build failure in cxgb4 when ipv6 is disabled/not in-built
  cxgb4i : Fix -Wunused-function warning
  cxgb4i: Remove duplicate call to dst_neigh_lookup()

 drivers/net/ethernet/chelsio/Kconfig            |   2 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |  15 +++
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c              | 146 ++----------------------
 drivers/scsi/cxgbi/libcxgbi.c                   |   2 +
 4 files changed, 26 insertions(+), 139 deletions(-)

-- 
2.1.2

^ permalink raw reply

* [PATCH net v2 3/4] cxgb4i : Fix -Wunused-function warning
From: Anish Bhatt @ 2014-10-15  3:07 UTC (permalink / raw)
  To: netdev, linux-scsi
  Cc: davem, hch, jbottomley, kxie, manojmalviya, hariprasad, leedom,
	swise, Anish Bhatt
In-Reply-To: <1413342444-16388-1-git-send-email-anish@chelsio.com>

A bunch of ipv6 related code is left on by default. While this causes no
compilation issues, there is no need to have this enabled by default. Guard
with an ipv6 check, which also takes care of a -Wunused-function warning.

Signed-off-by: Anish Bhatt <anish@chelsio.com>
---
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c | 8 ++++++++
 drivers/scsi/cxgbi/libcxgbi.c      | 2 ++
 2 files changed, 10 insertions(+)

diff --git a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
index 18d0d1c..df176f0 100644
--- a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
+++ b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
@@ -259,6 +259,7 @@ static void send_act_open_req(struct cxgbi_sock *csk, struct sk_buff *skb,
 	cxgb4_l2t_send(csk->cdev->ports[csk->port_id], skb, csk->l2t);
 }
 
+#if IS_ENABLED(CONFIG_IPV6)
 static void send_act_open_req6(struct cxgbi_sock *csk, struct sk_buff *skb,
 			       struct l2t_entry *e)
 {
@@ -344,6 +345,7 @@ static void send_act_open_req6(struct cxgbi_sock *csk, struct sk_buff *skb,
 
 	cxgb4_l2t_send(csk->cdev->ports[csk->port_id], skb, csk->l2t);
 }
+#endif
 
 static void send_close_req(struct cxgbi_sock *csk)
 {
@@ -781,9 +783,11 @@ static void csk_act_open_retry_timer(unsigned long data)
 	if (csk->csk_family == AF_INET) {
 		send_act_open_func = send_act_open_req;
 		skb = alloc_wr(size, 0, GFP_ATOMIC);
+#if IS_ENABLED(CONFIG_IPV6)
 	} else {
 		send_act_open_func = send_act_open_req6;
 		skb = alloc_wr(size6, 0, GFP_ATOMIC);
+#endif
 	}
 
 	if (!skb)
@@ -1335,8 +1339,10 @@ static int init_act_open(struct cxgbi_sock *csk)
 
 	if (csk->csk_family == AF_INET)
 		skb = alloc_wr(size, 0, GFP_NOIO);
+#if IS_ENABLED(CONFIG_IPV6)
 	else
 		skb = alloc_wr(size6, 0, GFP_NOIO);
+#endif
 
 	if (!skb)
 		goto rel_resource;
@@ -1370,8 +1376,10 @@ static int init_act_open(struct cxgbi_sock *csk)
 	cxgbi_sock_set_state(csk, CTP_ACTIVE_OPEN);
 	if (csk->csk_family == AF_INET)
 		send_act_open_req(csk, skb, csk->l2t);
+#if IS_ENABLED(CONFIG_IPV6)
 	else
 		send_act_open_req6(csk, skb, csk->l2t);
+#endif
 	neigh_release(n);
 
 	return 0;
diff --git a/drivers/scsi/cxgbi/libcxgbi.c b/drivers/scsi/cxgbi/libcxgbi.c
index 6a2001d..54fa6e0 100644
--- a/drivers/scsi/cxgbi/libcxgbi.c
+++ b/drivers/scsi/cxgbi/libcxgbi.c
@@ -275,6 +275,7 @@ struct cxgbi_device *cxgbi_device_find_by_netdev_rcu(struct net_device *ndev,
 }
 EXPORT_SYMBOL_GPL(cxgbi_device_find_by_netdev_rcu);
 
+#if IS_ENABLED(CONFIG_IPV6)
 static struct cxgbi_device *cxgbi_device_find_by_mac(struct net_device *ndev,
 						     int *port)
 {
@@ -307,6 +308,7 @@ static struct cxgbi_device *cxgbi_device_find_by_mac(struct net_device *ndev,
 		  ndev, ndev->name);
 	return NULL;
 }
+#endif
 
 void cxgbi_hbas_remove(struct cxgbi_device *cdev)
 {
-- 
2.1.2

^ permalink raw reply related

* [PATCH net v2 1/4] cxgb4i : Remove duplicated CLIP handling code
From: Anish Bhatt @ 2014-10-15  3:07 UTC (permalink / raw)
  To: netdev, linux-scsi
  Cc: davem, hch, jbottomley, kxie, manojmalviya, hariprasad, leedom,
	swise, Anish Bhatt
In-Reply-To: <1413342444-16388-1-git-send-email-anish@chelsio.com>

cxgb4 already handles CLIP updates from a previous changeset for iw_cxgb4,
there is no need to have this functionality in cxgb4i. Remove duplicated code

Signed-off-by: Anish Bhatt <anish@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |   7 ++
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c              | 133 ------------------------
 2 files changed, 7 insertions(+), 133 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index fed8f26..411acf0 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -4490,6 +4490,13 @@ static int update_root_dev_clip(struct net_device *dev)
 		return ret;
 
 	/* Parse all bond and vlan devices layered on top of the physical dev */
+	root_dev = netdev_master_upper_dev_get_rcu(dev);
+	if (root_dev) {
+		ret = update_dev_clip(root_dev, dev);
+		if (ret)
+			return ret;
+	}
+
 	for (i = 0; i < VLAN_N_VID; i++) {
 		root_dev = __vlan_find_dev_deep_rcu(dev, htons(ETH_P_8021Q), i);
 		if (!root_dev)
diff --git a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
index 02e69e7..18d0d1c 100644
--- a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
+++ b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
@@ -1635,129 +1635,6 @@ static int cxgb4i_ddp_init(struct cxgbi_device *cdev)
 	return 0;
 }
 
-#if IS_ENABLED(CONFIG_IPV6)
-static int cxgbi_inet6addr_handler(struct notifier_block *this,
-				   unsigned long event, void *data)
-{
-	struct inet6_ifaddr *ifa = data;
-	struct net_device *event_dev = ifa->idev->dev;
-	struct cxgbi_device *cdev;
-	int ret = NOTIFY_DONE;
-
-	if (event_dev->priv_flags & IFF_802_1Q_VLAN)
-		event_dev = vlan_dev_real_dev(event_dev);
-
-	cdev = cxgbi_device_find_by_netdev_rcu(event_dev, NULL);
-
-	if (!cdev)
-		return ret;
-
-	switch (event) {
-	case NETDEV_UP:
-		ret = cxgb4_clip_get(event_dev,
-				     (const struct in6_addr *)
-				     ((ifa)->addr.s6_addr));
-		if (ret < 0)
-			return ret;
-
-		ret = NOTIFY_OK;
-		break;
-
-	case NETDEV_DOWN:
-		cxgb4_clip_release(event_dev,
-				   (const struct in6_addr *)
-				   ((ifa)->addr.s6_addr));
-		ret = NOTIFY_OK;
-		break;
-
-	default:
-		break;
-	}
-
-	return ret;
-}
-
-static struct notifier_block cxgbi_inet6addr_notifier = {
-	.notifier_call = cxgbi_inet6addr_handler
-};
-
-/* Retrieve IPv6 addresses from a root device (bond, vlan) associated with
- * a physical device.
- * The physical device reference is needed to send the actual CLIP command.
- */
-static int update_dev_clip(struct net_device *root_dev, struct net_device *dev)
-{
-	struct inet6_dev *idev = NULL;
-	struct inet6_ifaddr *ifa;
-	int ret = 0;
-
-	idev = __in6_dev_get(root_dev);
-	if (!idev)
-		return ret;
-
-	read_lock_bh(&idev->lock);
-	list_for_each_entry(ifa, &idev->addr_list, if_list) {
-		pr_info("updating the clip for addr %pI6\n",
-			ifa->addr.s6_addr);
-		ret = cxgb4_clip_get(dev, (const struct in6_addr *)
-				     ifa->addr.s6_addr);
-		if (ret < 0)
-			break;
-	}
-
-	read_unlock_bh(&idev->lock);
-	return ret;
-}
-
-static int update_root_dev_clip(struct net_device *dev)
-{
-	struct net_device *root_dev = NULL;
-	int i, ret = 0;
-
-	/* First populate the real net device's IPv6 address */
-	ret = update_dev_clip(dev, dev);
-	if (ret)
-		return ret;
-
-	/* Parse all bond and vlan devices layered on top of the physical dev */
-	root_dev = netdev_master_upper_dev_get(dev);
-	if (root_dev) {
-		ret = update_dev_clip(root_dev, dev);
-		if (ret)
-			return ret;
-	}
-
-	for (i = 0; i < VLAN_N_VID; i++) {
-		root_dev = __vlan_find_dev_deep_rcu(dev, htons(ETH_P_8021Q), i);
-		if (!root_dev)
-			continue;
-
-		ret = update_dev_clip(root_dev, dev);
-		if (ret)
-			break;
-	}
-	return ret;
-}
-
-static void cxgbi_update_clip(struct cxgbi_device *cdev)
-{
-	int i;
-
-	rcu_read_lock();
-
-	for (i = 0; i < cdev->nports; i++) {
-		struct net_device *dev = cdev->ports[i];
-		int ret = 0;
-
-		if (dev)
-			ret = update_root_dev_clip(dev);
-		if (ret < 0)
-			break;
-	}
-	rcu_read_unlock();
-}
-#endif /* IS_ENABLED(CONFIG_IPV6) */
-
 static void *t4_uld_add(const struct cxgb4_lld_info *lldi)
 {
 	struct cxgbi_device *cdev;
@@ -1876,10 +1753,6 @@ static int t4_uld_state_change(void *handle, enum cxgb4_state state)
 	switch (state) {
 	case CXGB4_STATE_UP:
 		pr_info("cdev 0x%p, UP.\n", cdev);
-#if IS_ENABLED(CONFIG_IPV6)
-		cxgbi_update_clip(cdev);
-#endif
-		/* re-initialize */
 		break;
 	case CXGB4_STATE_START_RECOVERY:
 		pr_info("cdev 0x%p, RECOVERY.\n", cdev);
@@ -1910,17 +1783,11 @@ static int __init cxgb4i_init_module(void)
 		return rc;
 	cxgb4_register_uld(CXGB4_ULD_ISCSI, &cxgb4i_uld_info);
 
-#if IS_ENABLED(CONFIG_IPV6)
-	register_inet6addr_notifier(&cxgbi_inet6addr_notifier);
-#endif
 	return 0;
 }
 
 static void __exit cxgb4i_exit_module(void)
 {
-#if IS_ENABLED(CONFIG_IPV6)
-	unregister_inet6addr_notifier(&cxgbi_inet6addr_notifier);
-#endif
 	cxgb4_unregister_uld(CXGB4_ULD_ISCSI);
 	cxgbi_device_unregister_all(CXGBI_FLAG_DEV_T4);
 	cxgbi_iscsi_cleanup(&cxgb4i_iscsi_transport, &cxgb4i_stt);
-- 
2.1.2

^ permalink raw reply related

* [PATCH net v2 4/4] cxgb4i: Remove duplicate call to dst_neigh_lookup()
From: Anish Bhatt @ 2014-10-15  3:07 UTC (permalink / raw)
  To: netdev, linux-scsi
  Cc: davem, hch, jbottomley, kxie, manojmalviya, hariprasad, leedom,
	swise, Anish Bhatt
In-Reply-To: <1413342444-16388-1-git-send-email-anish@chelsio.com>

There is an extra call to dst_neigh_lookup() leftover in cxgb4i that can cause 
an unreleased refcnt issue. Remove extraneous call.

Signed-off-by: Anish Bhatt <anish@chelsio.com>

Fixes : 759a0cc5a3e1b ('cxgb4i: Add ipv6 code to driver, call into libcxgbi ipv6 api')
---
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
index df176f0..8c3003b 100644
--- a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
+++ b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
@@ -1317,11 +1317,6 @@ static int init_act_open(struct cxgbi_sock *csk)
 	cxgbi_sock_set_flag(csk, CTPF_HAS_ATID);
 	cxgbi_sock_get(csk);
 
-	n = dst_neigh_lookup(csk->dst, &csk->daddr.sin_addr.s_addr);
-	if (!n) {
-		pr_err("%s, can't get neighbour of csk->dst.\n", ndev->name);
-		goto rel_resource;
-	}
 	csk->l2t = cxgb4_l2t_get(lldi->l2t, n, ndev, 0);
 	if (!csk->l2t) {
 		pr_err("%s, cannot alloc l2t.\n", ndev->name);
-- 
2.1.2

^ permalink raw reply related

* [PATCH net v2 2/4] cxgb4 : Fix build failure in cxgb4 when ipv6 is disabled/not in-built
From: Anish Bhatt @ 2014-10-15  3:07 UTC (permalink / raw)
  To: netdev, linux-scsi
  Cc: davem, hch, jbottomley, kxie, manojmalviya, hariprasad, leedom,
	swise, Anish Bhatt
In-Reply-To: <1413342444-16388-1-git-send-email-anish@chelsio.com>

cxgb4 ipv6 does not guard against ipv6 being disabled, or the standard
ipv6 module vs inbuilt tri-state issue. This was fixed for cxgb4i & iw_cxgb4
but missed for cxgb4.

Signed-off-by: Anish Bhatt <anish@chelsio.com>
---
 drivers/net/ethernet/chelsio/Kconfig            | 2 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 8 ++++++++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/chelsio/Kconfig b/drivers/net/ethernet/chelsio/Kconfig
index c3ce9df..ac6473f 100644
--- a/drivers/net/ethernet/chelsio/Kconfig
+++ b/drivers/net/ethernet/chelsio/Kconfig
@@ -68,7 +68,7 @@ config CHELSIO_T3
 
 config CHELSIO_T4
 	tristate "Chelsio Communications T4/T5 Ethernet support"
-	depends on PCI
+	depends on PCI && (IPV6 || IPV6=n)
 	select FW_LOADER
 	select MDIO
 	---help---
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 411acf0..3f60070 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -4369,6 +4369,7 @@ EXPORT_SYMBOL(cxgb4_unregister_uld);
  * success (true) if it belongs otherwise failure (false).
  * Called with rcu_read_lock() held.
  */
+#if IS_ENABLED(CONFIG_IPV6)
 static bool cxgb4_netdev(const struct net_device *netdev)
 {
 	struct adapter *adap;
@@ -4529,6 +4530,7 @@ static void update_clip(const struct adapter *adap)
 	}
 	rcu_read_unlock();
 }
+#endif /* IS_ENABLED(CONFIG_IPV6) */
 
 /**
  *	cxgb_up - enable the adapter
@@ -4575,7 +4577,9 @@ static int cxgb_up(struct adapter *adap)
 	t4_intr_enable(adap);
 	adap->flags |= FULL_INIT_DONE;
 	notify_ulds(adap, CXGB4_STATE_UP);
+#if IS_ENABLED(CONFIG_IPV6)
 	update_clip(adap);
+#endif
  out:
 	return err;
  irq_err:
@@ -6869,14 +6873,18 @@ static int __init cxgb4_init_module(void)
 	if (ret < 0)
 		debugfs_remove(cxgb4_debugfs_root);
 
+#if IS_ENABLED(CONFIG_IPV6)
 	register_inet6addr_notifier(&cxgb4_inet6addr_notifier);
+#endif
 
 	return ret;
 }
 
 static void __exit cxgb4_cleanup_module(void)
 {
+#if IS_ENABLED(CONFIG_IPV6)
 	unregister_inet6addr_notifier(&cxgb4_inet6addr_notifier);
+#endif
 	pci_unregister_driver(&cxgb4_driver);
 	debugfs_remove(cxgb4_debugfs_root);  /* NULL ok */
 }
-- 
2.1.2


^ permalink raw reply related

* Re: [PATCH net 0/3] SCTP fixes
From: Neil Horman @ 2014-10-15  3:06 UTC (permalink / raw)
  To: David Miller; +Cc: dborkman, linux-sctp, netdev
In-Reply-To: <20141014.124657.128658692728989163.davem@davemloft.net>

On Tue, Oct 14, 2014 at 12:46:57PM -0400, David Miller wrote:
> From: Daniel Borkmann <dborkman@redhat.com>
> Date: Thu,  9 Oct 2014 22:55:30 +0200
> 
> > Here are some SCTP fixes.
> > 
> > [ Note, immediate workaround would be to disable ASCONF (it
> >   is sysctl disabled by default). It is actually only used
> >   together with chunk authentication. ]
> 
> Series applied, thanks guys.
Why did you apply this Dave? There was ongoing discussion around it.
Neil

> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: [PATCH net 2/3] net: sctp: fix panic on duplicate ASCONF chunks
From: Neil Horman @ 2014-10-15  2:58 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, linux-sctp, netdev, Vlad Yasevich
In-Reply-To: <543B0DD7.7000900@redhat.com>

On Mon, Oct 13, 2014 at 01:25:11AM +0200, Daniel Borkmann wrote:
> On 10/12/2014 03:42 AM, Neil Horman wrote:
> >On Sat, Oct 11, 2014 at 12:02:31AM +0200, Daniel Borkmann wrote:
> >>On 10/10/2014 05:39 PM, Neil Horman wrote:
> >>...
> >>>Is it worth adding a WARN_ON, to indicate that two ASCONF chunks have been
> >>>received with duplicate serials?
> >>
> >>Don't think so, as this would be triggerable from outside.
> >>
> >WARN_ON_ONCE then, per serial number?
> 
> Sorry, but no. If someone seriously runs that in production and it
> triggers a WARN() from outside, admins will start sending us bug
> reports that apparently something with the kernel code is wrong.
> 
> WARN() should only be used if we have some *internal* unexpected bug,
> but can still fail gracefully. This would neither be an actual code bug
> nor would it be an internally triggered one, plus we add unnecessary
> complexity to the code. Similarly, for those reasons we don't WARN()
> and throw a stack trace when we receive, say, an skb of invalid length
> elsewhere.
> 
> I'd also like to avoid any additional pr_debug().
> 
> I don't think people enable them in production, and if they really do,
> it's too late anyway as we already have received this chunk. If anything,
> I'd rather like to see debugging code further removed as we have already
> different facilities in the kernel for runtime debugging that are much
> more powerful.
What do you suggest then?  It seems like this is a protocol error that an
administrator will want to be made aware of.  I'm open to other options, but
just saying "no" isn't sufficient for me.
Neil

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: Netlink mmap tx security?
From: David Miller @ 2014-10-15  2:09 UTC (permalink / raw)
  To: luto; +Cc: torvalds, kaber, netdev, tgraf
In-Reply-To: <CALCETrW1HFJ3QXMfV9Hv922eax0hbHJn5GmHPvtED8_JR5KOVg@mail.gmail.com>

From: Andy Lutomirski <luto@amacapital.net>
Date: Tue, 14 Oct 2014 19:03:11 -0700

> On Tue, Oct 14, 2014 at 7:01 PM, David Miller <davem@davemloft.net> wrote:
>> I really think this means I'll have to remove all of the netlink
>> mmap() support in order to prevent from breaking applications. :(
>>
>> The other option is to keep NETLINK_TX_RING, but copy the data into
>> a kernel side buffer before acting upon it.
> 
> Option 3, which sucks but maybe not that badly: change the value of
> NETLINK_RX_RING.  (Practically: add NETLINK_RX_RING2 or something like
> that.)

That would work as well.

There are pros and cons to all of these approaches.

I was thinking that if we do the "TX mmap --> copy to kernel buffer"
approach, then if in the future we find a way to make it work
reliably, we can avoid the copy.  And frankly performance wise it's no
worse than what happens via normal sendmsg() calls.

And all applications using NETLINK_RX_RING keep working and keep
getting the performance boost.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox