Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Netlink mmap tx security?
From: David Miller @ 2014-10-15  2:09 UTC (permalink / raw)
  To: luto; +Cc: torvalds, kaber, netdev, tgraf
In-Reply-To: <CALCETrW1HFJ3QXMfV9Hv922eax0hbHJn5GmHPvtED8_JR5KOVg@mail.gmail.com>

From: Andy Lutomirski <luto@amacapital.net>
Date: Tue, 14 Oct 2014 19:03:11 -0700

> On Tue, Oct 14, 2014 at 7:01 PM, David Miller <davem@davemloft.net> wrote:
>> I really think this means I'll have to remove all of the netlink
>> mmap() support in order to prevent from breaking applications. :(
>>
>> The other option is to keep NETLINK_TX_RING, but copy the data into
>> a kernel side buffer before acting upon it.
> 
> Option 3, which sucks but maybe not that badly: change the value of
> NETLINK_RX_RING.  (Practically: add NETLINK_RX_RING2 or something like
> that.)

That would work as well.

There are pros and cons to all of these approaches.

I was thinking that if we do the "TX mmap --> copy to kernel buffer"
approach, then if in the future we find a way to make it work
reliably, we can avoid the copy.  And frankly performance wise it's no
worse than what happens via normal sendmsg() calls.

And all applications using NETLINK_RX_RING keep working and keep
getting the performance boost.

^ permalink raw reply

* Re: [PATCH net-next,v2] hyperv: Add handling of IP header with option field in netvsc_set_hash()
From: David Miller @ 2014-10-15  2:05 UTC (permalink / raw)
  To: haiyangz; +Cc: olaf, netdev, jasowang, driverdev-devel, linux-kernel
In-Reply-To: <1413317117-28678-1-git-send-email-haiyangz@microsoft.com>

From: Haiyang Zhang <haiyangz@microsoft.com>
Date: Tue, 14 Oct 2014 20:05:17 +0000

> In case that the IP header has optional field at the end, this patch will
> get the port numbers after that field, and compute the hash.
> 
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>

This isn't even close to what I asked for.

I said to remove all of this by-hand header parsing code in the
hyperv driver, and use the generic code networking facilities
that exist already to do this.

^ permalink raw reply

* Re: [PATCH RFC v4 net 1/3] ipv6: Remove BACKTRACK macro
From: David Miller @ 2014-10-15  2:04 UTC (permalink / raw)
  To: kafai; +Cc: netdev, hannes
In-Reply-To: <20141015001409.GB27904@devbig242.prn2.facebook.com>

From: Martin Lau <kafai@fb.com>
Date: Tue, 14 Oct 2014 17:14:11 -0700

> Do you have input on the function signature and also the following
> two patches?

Sorry, patch review doesn't work this way.

Invest the time into reposting your series with the most glaring
problems fixed, and then reviewers will invest their time into
reviewing the updated patch series.

Thanks.

^ permalink raw reply

* Re: Netlink mmap tx security?
From: Andy Lutomirski @ 2014-10-15  2:03 UTC (permalink / raw)
  To: David Miller
  Cc: Linus Torvalds, Patrick McHardy, Network Development, Thomas Graf
In-Reply-To: <20141014.220111.179628329028952302.davem@davemloft.net>

On Tue, Oct 14, 2014 at 7:01 PM, David Miller <davem@davemloft.net> wrote:
> From: Andy Lutomirski <luto@amacapital.net>
> Date: Tue, 14 Oct 2014 15:16:46 -0700
>
>> It's at least remotely possible that there's something that assumes
>> that assumes that the availability of NETLINK_RX_RING implies
>> NETLINK_TX_RING, which would be unfortunate.
>
> I already found one such case, nlmon :-/
>
> It also reminds me that I'll have to update
> Documentation/networking/netlink_mmap.txt
>
> Thomas, the context is that we have to remove NETLINK_TX_RING support
> (there is absolutely no way whatsoever to reliably keep some thread of
> control from modifying the underlying pages while we parse and
> validate the netlink request).
>
> I'd like to be able to do so while retaining NETLINK_RX_RING because
> that works fine and is great for monitoring when the rate of events
> is high.
>
> But I already have found userland pieces of code, like nlmon, which
> assume that if one is present then both must be present.
>
> I really think this means I'll have to remove all of the netlink
> mmap() support in order to prevent from breaking applications. :(
>
> The other option is to keep NETLINK_TX_RING, but copy the data into
> a kernel side buffer before acting upon it.

Option 3, which sucks but maybe not that badly: change the value of
NETLINK_RX_RING.  (Practically: add NETLINK_RX_RING2 or something like
that.)

--Andy

^ permalink raw reply

* Re: linux-next: build failure after merge of the net tree
From: David Miller @ 2014-10-15  2:02 UTC (permalink / raw)
  To: sfr; +Cc: netdev, linux-next, linux-kernel, peppe.cavallaro
In-Reply-To: <20141015104411.306b138e@canb.auug.org.au>

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Wed, 15 Oct 2014 10:44:11 +1100

> Hi all,
> 
> After merging the net tree, today's linux-next build (arm
> multi_v7_defconfig) failed like this:
> 
> drivers/built-in.o: In function `.LANCHOR0':
> :(.rodata+0x6b764): undefined reference to `sti_gmac_data'
> :(.rodata+0x6b828): undefined reference to `sti_gmac_data'
> :(.rodata+0x6b8ec): undefined reference to `sti_gmac_data'
> :(.rodata+0x6b9b0): undefined reference to `sti_gmac_data'
> 
> Caused by commit 53b26b9bc9a5 ("stmmac: dwmac-sti: review the
> glue-logic for STi4xx and STiD127 SoCs") which renamed sti_gmac_data to
> stih4xx_dwmac_data (or something) without updating all the references
> to it (including the one added in the previous commit ...).
> 
> I reverted that commit for today.

Sigh, Giuseppe if I don't see a proper fix by tomorrow I'm
reverting all of the stmmac changes I applied today.

^ permalink raw reply

* Re: Netlink mmap tx security?
From: David Miller @ 2014-10-15  2:01 UTC (permalink / raw)
  To: luto; +Cc: torvalds, kaber, netdev, tgraf
In-Reply-To: <CALCETrWDSG2EhxBnM8Jm55Ov8PYQ_w3Wf88SKYOB1q1WgRpbEw@mail.gmail.com>

From: Andy Lutomirski <luto@amacapital.net>
Date: Tue, 14 Oct 2014 15:16:46 -0700

> It's at least remotely possible that there's something that assumes
> that assumes that the availability of NETLINK_RX_RING implies
> NETLINK_TX_RING, which would be unfortunate.

I already found one such case, nlmon :-/

It also reminds me that I'll have to update
Documentation/networking/netlink_mmap.txt

Thomas, the context is that we have to remove NETLINK_TX_RING support
(there is absolutely no way whatsoever to reliably keep some thread of
control from modifying the underlying pages while we parse and
validate the netlink request).

I'd like to be able to do so while retaining NETLINK_RX_RING because
that works fine and is great for monitoring when the rate of events
is high.

But I already have found userland pieces of code, like nlmon, which
assume that if one is present then both must be present.

I really think this means I'll have to remove all of the netlink
mmap() support in order to prevent from breaking applications. :(

The other option is to keep NETLINK_TX_RING, but copy the data into
a kernel side buffer before acting upon it.

^ permalink raw reply

* Re: [PATCH net 0/4] ipv6 and related cleanup for cxgb4/cxgb4i
From: David Miller @ 2014-10-15  1:52 UTC (permalink / raw)
  To: anish; +Cc: netdev, hariprasad, leedom, kxie, manojmalviya
In-Reply-To: <1413324791-12104-1-git-send-email-anish@chelsio.com>

From: Anish Bhatt <anish@chelsio.com>
Date: Tue, 14 Oct 2014 15:13:07 -0700

> This patch set removes some duplicated/extraneous code from cxgb4i,
> and guards cxgb4 against compilation failure based on ipv6 tristate.

Sorry Anish, you can't have your Subject lines be the entire
content of your commit messages.

Write something more substantial in your logs, and make the
Subject line be a very concise summary.

^ permalink raw reply

* RE: [PATCH] net: fec: ptp: fix convergence issue to support LinuxPTP stack
From: fugang.duan @ 2014-10-15  1:37 UTC (permalink / raw)
  To: Richard Cochran, Frank.Li@freescale.com
  Cc: David Miller, netdev@vger.kernel.org, bhutchings@solarflare.com
In-Reply-To: <20141014184952.GC14216@localhost.localdomain>

From: Richard Cochran <richardcochran@gmail.com> Sent: Wednesday, October 15, 2014 2:50 AM
>To: Li Frank-B20596
>Cc: David Miller; Duan Fugang-B38611; netdev@vger.kernel.org;
>bhutchings@solarflare.com
>Subject: Re: [PATCH] net: fec: ptp: fix convergence issue to support
>LinuxPTP stack
>
>On Tue, Oct 14, 2014 at 06:43:51PM +0000, Frank.Li@freescale.com wrote:
>> Only MX6 SX. Only MX6 SX added FEC_QUIRK_BUG_CAPTURE.
>
>But what about this comment:
>
>+/* ENET Block Guide/ Chapter for the iMX6SLX (PELE) address one issue:
>+ * Incorrect behavior for ENET_ATCR[Capture and Restart Bits]. These
>+bits will
>+ * always read a value zero. When these bits are set to 1'b1, these
>+should hold
>+ * value 1'b1 until the counter value is capture in the register clock
>domain.
>+ */
>
>It sounds like the bits are "sticky" until the counter value has been
>latched. Therefore you need to check the bits before reading the counter,
>but the code does not do this for the case of !SX.
>
>Thanks,
>Richard

Hi, Richard,

These "__should__" hold 1'b1 until the counter value is capture in the register clock domain.
But "Incorrect behavior for ENET_ATCR[Capture and Restart Bits]. These bits will always read a value zero"


Thanks,
Andy

^ permalink raw reply

* RE: [PATCH] net: fec: ptp: fix convergence issue to support LinuxPTP stack
From: fugang.duan @ 2014-10-15  1:30 UTC (permalink / raw)
  To: Richard Cochran, Frank.Li@freescale.com
  Cc: David Miller, netdev@vger.kernel.org, bhutchings@solarflare.com
In-Reply-To: <20141014183933.GA14216@localhost.localdomain>

From: Richard Cochran <richardcochran@gmail.com> Sent: Wednesday, October 15, 2014 2:40 AM
>To: Li Frank-B20596
>Cc: David Miller; Duan Fugang-B38611; netdev@vger.kernel.org;
>bhutchings@solarflare.com
>Subject: Re: [PATCH] net: fec: ptp: fix convergence issue to support
>LinuxPTP stack
>
>Frank,
>
>On Tue, Oct 14, 2014 at 06:27:28PM +0000, Frank.Li@freescale.com wrote:
>> Fugang's patch fix the problem existed in MX6SX platform.
>
>You say that Fugang's patch fixes a quirk in the SX device.
>
>But he said that the user's manual tells us to wait for some status bits
>to clear. (See the third line, with the 'while' key word).
>
>> > > On Tue, Oct 14, 2014 at 01:39:47PM +0800, Fugang Duan wrote:
>> > >> IEEE 1588 module has one hw issue in capturing the ATVR register.
>> > >> According to the user manual it is:
>> > >> 		ENET0->ATCR |= ENET_ATCR_CAPTURE_MASK;
>> > >> 		while(ENET0->ATCR & ENET_ATCR_CAPTURE_MASK);
>> > >> 		ts_counter_ns = ENET0->ATVR;
>
>That applies to all IMX devices, doesn't it?
>In any case, Fugang's change log is not clear.
>
>Confused,
>Richard

Pls read the commit log in entire ?

Regrads,
Andy

^ permalink raw reply

* RE: [PATCH] net: fec: ptp: fix convergence issue to support LinuxPTP stack
From: fugang.duan @ 2014-10-15  1:28 UTC (permalink / raw)
  To: Richard Cochran, Frank.Li@freescale.com
  Cc: David Miller, netdev@vger.kernel.org, bhutchings@solarflare.com
In-Reply-To: <20141014184952.GC14216@localhost.localdomain>

From: Richard Cochran <richardcochran@gmail.com> Sent: Wednesday, October 15, 2014 2:50 AM
>To: Li Frank-B20596
>Cc: David Miller; Duan Fugang-B38611; netdev@vger.kernel.org;
>bhutchings@solarflare.com
>Subject: Re: [PATCH] net: fec: ptp: fix convergence issue to support
>LinuxPTP stack
>
>On Tue, Oct 14, 2014 at 06:43:51PM +0000, Frank.Li@freescale.com wrote:
>> Only MX6 SX. Only MX6 SX added FEC_QUIRK_BUG_CAPTURE.
>
>But what about this comment:
>
>+/* ENET Block Guide/ Chapter for the iMX6SLX (PELE) address one issue:
>+ * Incorrect behavior for ENET_ATCR[Capture and Restart Bits]. These
>+bits will
>+ * always read a value zero. When these bits are set to 1'b1, these
>+should hold
>+ * value 1'b1 until the counter value is capture in the register clock
>domain.
>+ */
>
>It sounds like the bits are "sticky" until the counter value has been
>latched. Therefore you need to check the bits before reading the counter,
>but the code does not do this for the case of !SX.
>
>Thanks,
>Richard


Hi, Richard,

Fristly, the patch just fix ptp issue for imx6sx.
Secondly, pls see the patch commit log:

    IEEE 1588 module has one hw issue in capturing the ATVR register. According
    to the user manual it is:
               ENET0->ATCR |= ENET_ATCR_CAPTURE_MASK;
               while(ENET0->ATCR & ENET_ATCR_CAPTURE_MASK);
               ts_counter_ns = ENET0->ATVR;
    Incorrect behavior for ENET_ATCR[Capture and Restart Bits]. These bits will always
    read a value zero. According to SPEC, when these bits are set to 1'b1, these should
    hold value 1'b1 until the counter value is capture in the register clock domain.
    
    Unfortunately there is a bug with the way the bit "ENET_ATCR_CAPTURE" clears.
    So need something like:
               ENET0->ATCR |= ENET_ATCR_CAPTURE_MASK;
               wait();
               ts_counter_ns = ENET0->ATVR;
    
    The wait-time to be at least 6 clock cycle of the slower clock between the register
    clock and the 1588 clock. The 1588 ts_clk is 25Mhz, register clock is 66Mhz, so the
    wait-time must be greater than 240ns (40ns * 6). The workaround is that adding 1us
    delay before read ATVR.



There need add delay instead of while().

Thanks,
Andy

^ permalink raw reply

* RE: [PATCH net-next, v2] hyperv: Add handling of IP header with option field in netvsc_set_hash()
From: Haiyang Zhang @ 2014-10-15  0:41 UTC (permalink / raw)
  To: Haiyang Zhang, davem@davemloft.net, netdev@vger.kernel.org
  Cc: driverdev-devel@linuxdriverproject.org, olaf@aepfle.de,
	jasowang@redhat.com, linux-kernel@vger.kernel.org
In-Reply-To: <1413317117-28678-1-git-send-email-haiyangz@microsoft.com>



> -----Original Message-----
> From: Haiyang Zhang [mailto:haiyangz@microsoft.com]
> Sent: Tuesday, October 14, 2014 4:05 PM
> To: davem@davemloft.net; netdev@vger.kernel.org
> Cc: Haiyang Zhang; KY Srinivasan; olaf@aepfle.de; jasowang@redhat.com;
> linux-kernel@vger.kernel.org; driverdev-devel@linuxdriverproject.org
> Subject: [PATCH net-next,v2] hyperv: Add handling of IP header with
> option field in netvsc_set_hash()
> 

This is a duplicate of a patch submitted earlier today. Please ignore.

^ permalink raw reply

* Re: [PATCH RFC v4 net 1/3] ipv6: Remove BACKTRACK macro
From: Martin Lau @ 2014-10-15  0:14 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, hannes
In-Reply-To: <20141014.150106.2250383188165649088.davem@davemloft.net>

Hi,

> > +struct fib6_node *fib6_backtrack(struct fib6_node *fn,
> > +				 struct in6_addr *saddr);
> > +
> 
> I am completely mystified why you did this, could you explain the
> logic?  I want to know what drove you to make this exported.
>
> I marked it static in my example patch, and there is no caller outside
> of route.c
> 

I was thinking this function only works on 'struct fib6_node', so
it belongs to ip6_fib.c more than route.c.
f.e. like fib6_lookup() whose callers are also only in route.c

> Doing this also eliminates inlining opportunitites.
> 
> Please keep this private inside of route.c
I will keep it private in route.c and re-submit.

Do you have input on the function signature and also the following two
patches?

Thanks,
--Martin

^ permalink raw reply

* linux-next: build failure after merge of the net tree
From: Stephen Rothwell @ 2014-10-14 23:44 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: linux-next, linux-kernel, Giuseppe CAVALLARO

[-- Attachment #1: Type: text/plain, Size: 776 bytes --]

Hi all,

After merging the net tree, today's linux-next build (arm
multi_v7_defconfig) failed like this:

drivers/built-in.o: In function `.LANCHOR0':
:(.rodata+0x6b764): undefined reference to `sti_gmac_data'
:(.rodata+0x6b828): undefined reference to `sti_gmac_data'
:(.rodata+0x6b8ec): undefined reference to `sti_gmac_data'
:(.rodata+0x6b9b0): undefined reference to `sti_gmac_data'

Caused by commit 53b26b9bc9a5 ("stmmac: dwmac-sti: review the
glue-logic for STi4xx and STiD127 SoCs") which renamed sti_gmac_data to
stih4xx_dwmac_data (or something) without updating all the references
to it (including the one added in the previous commit ...).

I reverted that commit for today.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* RE: [PATCH v2] ixgbe: check adapter->vfinfo before dereference
From: Tantilov, Emil S @ 2014-10-14 23:18 UTC (permalink / raw)
  To: Thierry Herbelot, Kirsher, Jeffrey T, Brandeburg, Jesse,
	Allan, Bruce W, netdev@vger.kernel.org
In-Reply-To: <1412930732-892-1-git-send-email-thierry.herbelot@6wind.com>

>-----Original Message-----
>From: netdev-owner@vger.kernel.org [mailto:netdev-
>owner@vger.kernel.org] On Behalf Of Thierry Herbelot
>Sent: Friday, October 10, 2014 1:46 AM
>To: Kirsher, Jeffrey T; Brandeburg, Jesse; Allan, Bruce W;
>netdev@vger.kernel.org
>Cc: Thierry Herbelot
>Subject: [PATCH v2] ixgbe: check adapter->vfinfo before
>dereference
>
>this protects against the following panic:
>(before a VF was actually created on p96p1 PF Ethernet port)
>ip link set p96p1 vf 0 spoofchk off
>BUG: unable to handle kernel NULL pointer dereference at
>0000000000000052
>IP: [<ffffffffa044a1c1>]
>ixgbe_ndo_set_vf_spoofchk+0x51/0x150 [ixgbe]
>
>Signed-off-by: Thierry Herbelot <thierry.herbelot@6wind.com>
>---
> drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c |   73
>+++++++++++++++++++++++-
> 1 file changed, 70 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
>b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
>index 706fc69..29279ad 100644
>--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
>+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
>@@ -316,7 +316,7 @@ static int ixgbe_set_vf_multicasts(struct ixgbe_adapter *adapter,
> 	int entries = (msgbuf[0] & IXGBE_VT_MSGINFO_MASK)
> 		       >> IXGBE_VT_MSGINFO_SHIFT;
> 	u16 *hash_list = (u16 *)&msgbuf[1];
>-	struct vf_data_storage *vfinfo = &adapter->vfinfo[vf];
>+	struct vf_data_storage *vfinfo;
> 	struct ixgbe_hw *hw = &adapter->hw;
> 	int i;
> 	u32 vector_bit;
>@@ -324,6 +324,11 @@ static int
>ixgbe_set_vf_multicasts(struct ixgbe_adapter *adapter,
> 	u32 mta_reg;
> 	u32 vmolr = IXGBE_READ_REG(hw, IXGBE_VMOLR(vf));
>
>+	if (!adapter->vfinfo)
>+		return -1;
>+
>+	vfinfo = &adapter->vfinfo[vf];

This check makes sense for the ndo functions that get called by the ip command, but I don't think we need to add them before every use of adapter->vfinfo. In this case for example ixgbe_set_vf_multicasts() is called from
ixgbe_rcv_msg_from_vf() which will be called when an actual VF exists.

Also for the error -EINVAL probably makes more sense than -1.

Thanks,
Emil

^ permalink raw reply

* Re: [PATCH RFC] virtio_net: enable tx interrupt
From: Michael S. Tsirkin @ 2014-10-14 23:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, virtualization
In-Reply-To: <1413323524-23380-1-git-send-email-mst@redhat.com>

On Wed, Oct 15, 2014 at 12:53:59AM +0300, Michael S. Tsirkin wrote:
>  static void skb_xmit_done(struct virtqueue *vq)
>  {
>  	struct virtnet_info *vi = vq->vdev->priv;
> +	struct send_queue *sq = &vi->sq[vq2txq(vq)];
>  
> -	/* Suppress further interrupts. */
> -	virtqueue_disable_cb(vq);
> -

One note here: current code seems racy because of doing
virtqueue_disable_cb from skb_xmit_done that I'm dropping here: there's
no guarantee we don't get an interrupt while tx ring is running, and if
that happens we can end up with interrupts disabled forever.

> -	/* We were probably waiting for more output buffers. */
> -	netif_wake_subqueue(vi->dev, vq2txq(vq));
> +	if (napi_schedule_prep(&sq->napi)) {
> +		__napi_schedule(&sq->napi);
> +	}
>  }
>  
>  static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)

^ permalink raw reply

* Re: [PATCH net-next RFC 0/3] virtio-net: Conditionally enable tx interrupt
From: Michael S. Tsirkin @ 2014-10-14 23:06 UTC (permalink / raw)
  To: David Miller; +Cc: kvm, netdev, linux-kernel, virtualization, linux-api
In-Reply-To: <20141014.145327.365091739350390288.davem@davemloft.net>

On Tue, Oct 14, 2014 at 02:53:27PM -0400, David Miller wrote:
> From: Jason Wang <jasowang@redhat.com>
> Date: Sat, 11 Oct 2014 15:16:43 +0800
> 
> > We free old transmitted packets in ndo_start_xmit() currently, so any
> > packet must be orphaned also there. This was used to reduce the overhead of
> > tx interrupt to achieve better performance. But this may not work for some
> > protocols such as TCP stream. TCP depends on the value of sk_wmem_alloc to
> > implement various optimization for small packets stream such as TCP small
> > queue and auto corking. But orphaning packets early in ndo_start_xmit()
> > disable such things more or less since sk_wmem_alloc was not accurate. This
> > lead extra low throughput for TCP stream of small writes.
> > 
> > This series tries to solve this issue by enable tx interrupts for all TCP
> > packets other than the ones with push bit or pure ACK. This is done through
> > the support of urgent descriptor which can force an interrupt for a
> > specified packet. If tx interrupt was enabled for a packet, there's no need
> > to orphan it in ndo_start_xmit(), we can free it tx napi which is scheduled
> > by tx interrupt. Then sk_wmem_alloc was more accurate than before and TCP
> > can batch more for small write. More larger skb was produced by TCP in this
> > case to improve both throughput and cpu utilization.
> > 
> > Test shows great improvements on small write tcp streams. For most of the
> > other cases, the throughput and cpu utilization are the same in the
> > past. Only few cases, more cpu utilization was noticed which needs more
> > investigation.
> > 
> > Review and comments are welcomed.
> 
> I think proper accounting and queueing (at all levels, not just TCP
> sockets) is more important than trying to skim a bunch of cycles by
> avoiding TX interrupts.
> 
> Having an event to free the SKB is absolutely essential for the stack
> to operate correctly.
> 
> And with virtio-net you don't even have the excuse of "the HW
> unfortunately doesn't have an appropriate TX event."
> 
> So please don't play games, and instead use TX interrupts all the
> time.  You can mitigate them in various ways, but don't turn them on
> selectively based upon traffic type, that's terrible.
> 
> You can even use ->xmit_more to defer the TX interrupt indication to
> the final TX packet in the chain.

I guess we can just defer the kick, interrupt will naturally be
deferred as well.
This should solve the problem for old hosts as well.

We'll also need to implement bql for this.
Something like the below?
Completely untested - posting here to see if I figured the
API out correctly. Has to be applied on top of the previous patch.

---

virtio_net: bql + xmit_more

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

---

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 62c059d..c245047 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -213,13 +213,15 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
 	return p;
 }
 
-static int free_old_xmit_skbs(struct send_queue *sq, int budget)
+static int free_old_xmit_skbs(struct netdev_queue *txq,
+			      struct send_queue *sq, int budget)
 {
 	struct sk_buff *skb;
 	unsigned int len;
 	struct virtnet_info *vi = sq->vq->vdev->priv;
 	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
 	int sent = 0;
+	unsigned int bytes = 0;
 
 	while (sent < budget &&
 	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
@@ -227,6 +229,7 @@ static int free_old_xmit_skbs(struct send_queue *sq, int budget)
 
 		u64_stats_update_begin(&stats->tx_syncp);
 		stats->tx_bytes += skb->len;
+		bytes += skb->len;
 		stats->tx_packets++;
 		u64_stats_update_end(&stats->tx_syncp);
 
@@ -234,6 +237,8 @@ static int free_old_xmit_skbs(struct send_queue *sq, int budget)
 		sent++;
 	}
 
+	netdev_tx_completed_queue(txq, sent, bytes);
+
 	return sent;
 }
 
@@ -802,7 +807,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
 again:
 	__netif_tx_lock(txq, smp_processor_id());
 	virtqueue_disable_cb(sq->vq);
-	sent += free_old_xmit_skbs(sq, budget - sent);
+	sent += free_old_xmit_skbs(txq, sq, budget - sent);
 
 	if (sent < budget) {
 		r = virtqueue_enable_cb_prepare(sq->vq);
@@ -951,6 +956,9 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 	int qnum = skb_get_queue_mapping(skb);
 	struct send_queue *sq = &vi->sq[qnum];
 	int err, qsize = virtqueue_get_vring_size(sq->vq);
+	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
+	bool kick = !skb->xmit_more || netif_xmit_stopped(txq);
+	unsigned int bytes = skb->len;
 
 	virtqueue_disable_cb(sq->vq);
 
@@ -967,7 +975,11 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 		dev_kfree_skb_any(skb);
 		return NETDEV_TX_OK;
 	}
-	virtqueue_kick(sq->vq);
+
+	netdev_tx_sent_queue(txq, bytes);
+
+	if (kick)
+		virtqueue_kick(sq->vq);
 
 	/* Apparently nice girls don't return TX_BUSY; stop the queue
 	 * before it gets out of hand.  Naturally, this wastes entries. */
@@ -975,14 +987,14 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 		netif_stop_subqueue(dev, qnum);
 		if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
 			/* More just got used, free them then recheck. */
-			free_old_xmit_skbs(sq, qsize);
+			free_old_xmit_skbs(txq, sq, qsize);
 			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
 				netif_start_subqueue(dev, qnum);
 				virtqueue_disable_cb(sq->vq);
 			}
 		}
 	} else if (virtqueue_enable_cb_delayed(sq->vq)) {
-		free_old_xmit_skbs(sq, qsize);
+		free_old_xmit_skbs(txq, sq, qsize);
 	}
 
 	return NETDEV_TX_OK;

^ permalink raw reply related

* CONTACT THIS EMAIL FOR MORE DETAILS.. ( cocacolalttrwrd@gmail.com )
From: Coleman-Fire, Bethany @ 2014-10-14 22:27 UTC (permalink / raw)





congratulations, You have won!

^ permalink raw reply

* [PATCH v2] net: Add ndo_gso_check
From: Tom Herbert @ 2014-10-14 22:19 UTC (permalink / raw)
  To: davem, netdev

Add ndo_gso_check which a device can define to indicate whether is
is capable of doing GSO on a packet. This funciton would be called from
the stack to determine whether software GSO is needed to be done. A
driver should populate this function if it advertises GSO types for
which there are combinations that it wouldn't be able to handle. For
instance a device that performs UDP tunneling might only implement
support for transparent Ethernet bridging type of inner packets
or might have limitations on lengths of inner headers.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 drivers/net/macvtap.c      |  2 +-
 drivers/net/xen-netfront.c |  2 +-
 include/linux/netdevice.h  | 12 +++++++++++-
 net/core/dev.c             |  2 +-
 4 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 0c6adaa..65e2892 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -298,7 +298,7 @@ static rx_handler_result_t macvtap_handle_frame(struct sk_buff **pskb)
 	 */
 	if (q->flags & IFF_VNET_HDR)
 		features |= vlan->tap_features;
-	if (netif_needs_gso(skb, features)) {
+	if (netif_needs_gso(dev, skb, features)) {
 		struct sk_buff *segs = __skb_gso_segment(skb, features, false);
 
 		if (IS_ERR(segs))
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index ca82f54..3c0b375 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -638,7 +638,7 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	if (unlikely(!netif_carrier_ok(dev) ||
 		     (slots > 1 && !xennet_can_sg(dev)) ||
-		     netif_needs_gso(skb, netif_skb_features(skb)))) {
+		     netif_needs_gso(dev, skb, netif_skb_features(skb)))) {
 		spin_unlock_irqrestore(&queue->tx_lock, flags);
 		goto drop;
 	}
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 838407a..74fd5d3 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -998,6 +998,12 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
  *	Callback to use for xmit over the accelerated station. This
  *	is used in place of ndo_start_xmit on accelerated net
  *	devices.
+ * bool	(*ndo_gso_check) (struct sk_buff *skb,
+ *			  struct net_device *dev);
+ *	Called by core transmit path to determine if device is capable of
+ *	performing GSO on a packet. The device returns true if it is
+ *	able to GSO the packet, false otherwise. If the return value is
+ *	false the stack will do software GSO.
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -1147,6 +1153,8 @@ struct net_device_ops {
 							struct net_device *dev,
 							void *priv);
 	int			(*ndo_get_lock_subclass)(struct net_device *dev);
+	bool			(*ndo_gso_check) (struct sk_buff *skb,
+						  struct net_device *dev);
 };
 
 /**
@@ -3572,10 +3580,12 @@ static inline bool skb_gso_ok(struct sk_buff *skb, netdev_features_t features)
 	       (!skb_has_frag_list(skb) || (features & NETIF_F_FRAGLIST));
 }
 
-static inline bool netif_needs_gso(struct sk_buff *skb,
+static inline bool netif_needs_gso(struct net_device *dev, struct sk_buff *skb,
 				   netdev_features_t features)
 {
 	return skb_is_gso(skb) && (!skb_gso_ok(skb, features) ||
+		(dev->netdev_ops->ndo_gso_check &&
+		 !dev->netdev_ops->ndo_gso_check(skb, dev)) ||
 		unlikely((skb->ip_summed != CHECKSUM_PARTIAL) &&
 			 (skb->ip_summed != CHECKSUM_UNNECESSARY)));
 }
diff --git a/net/core/dev.c b/net/core/dev.c
index 3c5bdaa..0ae59ec 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2677,7 +2677,7 @@ static struct sk_buff *validate_xmit_skb(struct sk_buff *skb, struct net_device
 	if (skb->encapsulation)
 		features &= dev->hw_enc_features;
 
-	if (netif_needs_gso(skb, features)) {
+	if (netif_needs_gso(dev, skb, features)) {
 		struct sk_buff *segs;
 
 		segs = skb_gso_segment(skb, features);
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related

* Re: Netlink mmap tx security?
From: Andy Lutomirski @ 2014-10-14 22:16 UTC (permalink / raw)
  To: David Miller; +Cc: Linus Torvalds, Patrick McHardy, Network Development
In-Reply-To: <20141014.160056.2113064815910782529.davem@davemloft.net>

On Tue, Oct 14, 2014 at 1:00 PM, David Miller <davem@davemloft.net> wrote:
> From: Andy Lutomirski <luto@amacapital.net>
> Date: Tue, 14 Oct 2014 12:33:43 -0700
>
>> For full honesty, there is now the machinery developed for memfd
>> sealing, but I can't imagine that this is ever faster than just
>> copying the buffer.
>
> I don't have much motivation to even check if it's a worthwhile
> pursuit at this point.
>
> Someone who wants to can :-)
>
>> I think that the NETLINK_SKB_TX declaration in include/linux/netlink.h
>> should probably go, too.  And there's the last parameter to
>> netlink_set_ring, too, and possibly even the nlk->tx_ring struct
>> itself.
>
> Agreed on all counts, here is the new patch:
>
> ====================
> [PATCH] netlink: Remove TX mmap support.
>
> There is no reasonable manner in which to absolutely make sure that another
> thread of control cannot write to the pages in the mmap()'d area and thus
> make sure that netlink messages do not change underneath us after we've
> performed verifications.
>
> Reported-by: Andy Lutomirski <luto@amacapital.net>
> Signed-off-by: David S. Miller <davem@davemloft.net>

Looks sensible to me, but I have no idea how to test it.

It's at least remotely possible that there's something that assumes
that assumes that the availability of NETLINK_RX_RING implies
NETLINK_TX_RING, which would be unfortunate.

--Andy

^ permalink raw reply

* [PATCH net-next, v2] hyperv: Add handling of IP header with option field in netvsc_set_hash()
From: Haiyang Zhang @ 2014-10-14 22:16 UTC (permalink / raw)
  To: davem, netdev; +Cc: olaf, jasowang, driverdev-devel, linux-kernel, haiyangz

In case that the IP header has optional field at the end, this patch will
get the port numbers after that field, and compute the hash.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
---
 drivers/net/hyperv/netvsc_drv.c |   16 ++++++++++------
 1 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 0fcb5e7..0d60c91 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -162,7 +162,7 @@ union sub_key {
  * data: network byte order
  * return: host byte order
  */
-static u32 comp_hash(u8 *key, int klen, u8 *data, int dlen)
+static u32 comp_hash(u8 *key, int klen, void *data, int dlen)
 {
 	union sub_key subk;
 	int k_next = 4;
@@ -176,7 +176,7 @@ static u32 comp_hash(u8 *key, int klen, u8 *data, int dlen)
 	for (i = 0; i < dlen; i++) {
 		subk.kb = key[k_next];
 		k_next = (k_next + 1) % klen;
-		dt = data[i];
+		dt = ((u8 *)data)[i];
 		for (j = 0; j < 8; j++) {
 			if (dt & 0x80)
 				ret ^= subk.ka;
@@ -191,6 +191,7 @@ static u32 comp_hash(u8 *key, int klen, u8 *data, int dlen)
 static bool netvsc_set_hash(u32 *hash, struct sk_buff *skb)
 {
 	struct iphdr *iphdr;
+	__be32 dbuf[3];
 	int data_len;
 	bool ret = false;
 
@@ -200,12 +201,15 @@ static bool netvsc_set_hash(u32 *hash, struct sk_buff *skb)
 	iphdr = ip_hdr(skb);
 
 	if (iphdr->version == 4) {
-		if (iphdr->protocol == IPPROTO_TCP)
+		dbuf[0] = iphdr->saddr;
+		dbuf[1] = iphdr->daddr;
+		if (iphdr->protocol == IPPROTO_TCP) {
+			dbuf[2] = *(__be32 *)&tcp_hdr(skb)->source;
 			data_len = 12;
-		else
+		} else {
 			data_len = 8;
-		*hash = comp_hash(netvsc_hash_key, HASH_KEYLEN,
-				  (u8 *)&iphdr->saddr, data_len);
+		}
+		*hash = comp_hash(netvsc_hash_key, HASH_KEYLEN, dbuf, data_len);
 		ret = true;
 	}
 
-- 
1.7.1

^ permalink raw reply related

* [PATCH net 3/4] cxgb4i : All this code is only needed when IPV6 is enabled, disable when not required, also fixes -Wunused-function warning
From: Anish Bhatt @ 2014-10-14 22:13 UTC (permalink / raw)
  To: netdev; +Cc: davem, hariprasad, leedom, kxie, manojmalviya, Anish Bhatt
In-Reply-To: <1413324791-12104-1-git-send-email-anish@chelsio.com>

Signed-off-by: Anish Bhatt <anish@chelsio.com>
---
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c | 8 ++++++++
 drivers/scsi/cxgbi/libcxgbi.c      | 2 ++
 2 files changed, 10 insertions(+)

diff --git a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
index 18d0d1c145ad..df176f0e5e60 100644
--- a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
+++ b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
@@ -259,6 +259,7 @@ static void send_act_open_req(struct cxgbi_sock *csk, struct sk_buff *skb,
 	cxgb4_l2t_send(csk->cdev->ports[csk->port_id], skb, csk->l2t);
 }
 
+#if IS_ENABLED(CONFIG_IPV6)
 static void send_act_open_req6(struct cxgbi_sock *csk, struct sk_buff *skb,
 			       struct l2t_entry *e)
 {
@@ -344,6 +345,7 @@ static void send_act_open_req6(struct cxgbi_sock *csk, struct sk_buff *skb,
 
 	cxgb4_l2t_send(csk->cdev->ports[csk->port_id], skb, csk->l2t);
 }
+#endif
 
 static void send_close_req(struct cxgbi_sock *csk)
 {
@@ -781,9 +783,11 @@ static void csk_act_open_retry_timer(unsigned long data)
 	if (csk->csk_family == AF_INET) {
 		send_act_open_func = send_act_open_req;
 		skb = alloc_wr(size, 0, GFP_ATOMIC);
+#if IS_ENABLED(CONFIG_IPV6)
 	} else {
 		send_act_open_func = send_act_open_req6;
 		skb = alloc_wr(size6, 0, GFP_ATOMIC);
+#endif
 	}
 
 	if (!skb)
@@ -1335,8 +1339,10 @@ static int init_act_open(struct cxgbi_sock *csk)
 
 	if (csk->csk_family == AF_INET)
 		skb = alloc_wr(size, 0, GFP_NOIO);
+#if IS_ENABLED(CONFIG_IPV6)
 	else
 		skb = alloc_wr(size6, 0, GFP_NOIO);
+#endif
 
 	if (!skb)
 		goto rel_resource;
@@ -1370,8 +1376,10 @@ static int init_act_open(struct cxgbi_sock *csk)
 	cxgbi_sock_set_state(csk, CTP_ACTIVE_OPEN);
 	if (csk->csk_family == AF_INET)
 		send_act_open_req(csk, skb, csk->l2t);
+#if IS_ENABLED(CONFIG_IPV6)
 	else
 		send_act_open_req6(csk, skb, csk->l2t);
+#endif
 	neigh_release(n);
 
 	return 0;
diff --git a/drivers/scsi/cxgbi/libcxgbi.c b/drivers/scsi/cxgbi/libcxgbi.c
index d65df6dc106f..64fcb1eea0af 100644
--- a/drivers/scsi/cxgbi/libcxgbi.c
+++ b/drivers/scsi/cxgbi/libcxgbi.c
@@ -230,6 +230,7 @@ struct cxgbi_device *cxgbi_device_find_by_netdev(struct net_device *ndev,
 }
 EXPORT_SYMBOL_GPL(cxgbi_device_find_by_netdev);
 
+#if IS_ENABLED(CONFIG_IPV6)
 static struct cxgbi_device *cxgbi_device_find_by_mac(struct net_device *ndev,
 						     int *port)
 {
@@ -262,6 +263,7 @@ static struct cxgbi_device *cxgbi_device_find_by_mac(struct net_device *ndev,
 		  ndev, ndev->name);
 	return NULL;
 }
+#endif
 
 void cxgbi_hbas_remove(struct cxgbi_device *cdev)
 {
-- 
2.1.2

^ permalink raw reply related

* [PATCH net 4/4] cxgb4i: Remove duplicate call to dst_neigh_lookup()
From: Anish Bhatt @ 2014-10-14 22:13 UTC (permalink / raw)
  To: netdev; +Cc: davem, hariprasad, leedom, kxie, manojmalviya, Anish Bhatt
In-Reply-To: <1413324791-12104-1-git-send-email-anish@chelsio.com>

Signed-off-by: Anish Bhatt <anish@chelsio.com>
---
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
index df176f0e5e60..8c3003b6d591 100644
--- a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
+++ b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
@@ -1317,11 +1317,6 @@ static int init_act_open(struct cxgbi_sock *csk)
 	cxgbi_sock_set_flag(csk, CTPF_HAS_ATID);
 	cxgbi_sock_get(csk);
 
-	n = dst_neigh_lookup(csk->dst, &csk->daddr.sin_addr.s_addr);
-	if (!n) {
-		pr_err("%s, can't get neighbour of csk->dst.\n", ndev->name);
-		goto rel_resource;
-	}
 	csk->l2t = cxgb4_l2t_get(lldi->l2t, n, ndev, 0);
 	if (!csk->l2t) {
 		pr_err("%s, cannot alloc l2t.\n", ndev->name);
-- 
2.1.2

^ permalink raw reply related

* [PATCH net 2/4] cxgb4 : Fix build failure in cxgb4 when ipv6 is disabled/not in-built
From: Anish Bhatt @ 2014-10-14 22:13 UTC (permalink / raw)
  To: netdev; +Cc: davem, hariprasad, leedom, kxie, manojmalviya, Anish Bhatt
In-Reply-To: <1413324791-12104-1-git-send-email-anish@chelsio.com>

Signed-off-by: Anish Bhatt <anish@chelsio.com>
---
 drivers/net/ethernet/chelsio/Kconfig            | 2 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 8 ++++++++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/chelsio/Kconfig b/drivers/net/ethernet/chelsio/Kconfig
index c3ce9df0041a..ac6473f75eb9 100644
--- a/drivers/net/ethernet/chelsio/Kconfig
+++ b/drivers/net/ethernet/chelsio/Kconfig
@@ -68,7 +68,7 @@ config CHELSIO_T3
 
 config CHELSIO_T4
 	tristate "Chelsio Communications T4/T5 Ethernet support"
-	depends on PCI
+	depends on PCI && (IPV6 || IPV6=n)
 	select FW_LOADER
 	select MDIO
 	---help---
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 6222a42f7a2a..ff9fb1629815 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -4359,6 +4359,7 @@ EXPORT_SYMBOL(cxgb4_unregister_uld);
  * success (true) if it belongs otherwise failure (false).
  * Called with rcu_read_lock() held.
  */
+#if IS_ENABLED(CONFIG_IPV6)
 static bool cxgb4_netdev(const struct net_device *netdev)
 {
 	struct adapter *adap;
@@ -4519,6 +4520,7 @@ static void update_clip(const struct adapter *adap)
 	}
 	rcu_read_unlock();
 }
+#endif /* IS_ENABLED(CONFIG_IPV6) */
 
 /**
  *	cxgb_up - enable the adapter
@@ -4565,7 +4567,9 @@ static int cxgb_up(struct adapter *adap)
 	t4_intr_enable(adap);
 	adap->flags |= FULL_INIT_DONE;
 	notify_ulds(adap, CXGB4_STATE_UP);
+#if IS_ENABLED(CONFIG_IPV6)
 	update_clip(adap);
+#endif
  out:
 	return err;
  irq_err:
@@ -6855,14 +6859,18 @@ static int __init cxgb4_init_module(void)
 	if (ret < 0)
 		debugfs_remove(cxgb4_debugfs_root);
 
+#if IS_ENABLED(CONFIG_IPV6)
 	register_inet6addr_notifier(&cxgb4_inet6addr_notifier);
+#endif
 
 	return ret;
 }
 
 static void __exit cxgb4_cleanup_module(void)
 {
+#if IS_ENABLED(CONFIG_IPV6)
 	unregister_inet6addr_notifier(&cxgb4_inet6addr_notifier);
+#endif
 	pci_unregister_driver(&cxgb4_driver);
 	debugfs_remove(cxgb4_debugfs_root);  /* NULL ok */
 }
-- 
2.1.2

^ permalink raw reply related

* [PATCH net 1/4] cxgb4i : Remove duplicated code from cxgb4i, fucntionality already present in  cxgb4
From: Anish Bhatt @ 2014-10-14 22:13 UTC (permalink / raw)
  To: netdev; +Cc: davem, hariprasad, leedom, kxie, manojmalviya, Anish Bhatt
In-Reply-To: <1413324791-12104-1-git-send-email-anish@chelsio.com>

Signed-off-by: Anish Bhatt <anish@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |   7 ++
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c              | 133 ------------------------
 2 files changed, 7 insertions(+), 133 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 321f3d9385c9..6222a42f7a2a 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -4480,6 +4480,13 @@ static int update_root_dev_clip(struct net_device *dev)
 		return ret;
 
 	/* Parse all bond and vlan devices layered on top of the physical dev */
+	root_dev = netdev_master_upper_dev_get_rcu(dev);
+	if (root_dev) {
+		ret = update_dev_clip(root_dev, dev);
+		if (ret)
+			return ret;
+	}
+
 	for (i = 0; i < VLAN_N_VID; i++) {
 		root_dev = __vlan_find_dev_deep_rcu(dev, htons(ETH_P_8021Q), i);
 		if (!root_dev)
diff --git a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
index 79788a12712d..18d0d1c145ad 100644
--- a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
+++ b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
@@ -1635,129 +1635,6 @@ static int cxgb4i_ddp_init(struct cxgbi_device *cdev)
 	return 0;
 }
 
-#if IS_ENABLED(CONFIG_IPV6)
-static int cxgbi_inet6addr_handler(struct notifier_block *this,
-				   unsigned long event, void *data)
-{
-	struct inet6_ifaddr *ifa = data;
-	struct net_device *event_dev = ifa->idev->dev;
-	struct cxgbi_device *cdev;
-	int ret = NOTIFY_DONE;
-
-	if (event_dev->priv_flags & IFF_802_1Q_VLAN)
-		event_dev = vlan_dev_real_dev(event_dev);
-
-	cdev = cxgbi_device_find_by_netdev(event_dev, NULL);
-
-	if (!cdev)
-		return ret;
-
-	switch (event) {
-	case NETDEV_UP:
-		ret = cxgb4_clip_get(event_dev,
-				     (const struct in6_addr *)
-				     ((ifa)->addr.s6_addr));
-		if (ret < 0)
-			return ret;
-
-		ret = NOTIFY_OK;
-		break;
-
-	case NETDEV_DOWN:
-		cxgb4_clip_release(event_dev,
-				   (const struct in6_addr *)
-				   ((ifa)->addr.s6_addr));
-		ret = NOTIFY_OK;
-		break;
-
-	default:
-		break;
-	}
-
-	return ret;
-}
-
-static struct notifier_block cxgbi_inet6addr_notifier = {
-	.notifier_call = cxgbi_inet6addr_handler
-};
-
-/* Retrieve IPv6 addresses from a root device (bond, vlan) associated with
- * a physical device.
- * The physical device reference is needed to send the actual CLIP command.
- */
-static int update_dev_clip(struct net_device *root_dev, struct net_device *dev)
-{
-	struct inet6_dev *idev = NULL;
-	struct inet6_ifaddr *ifa;
-	int ret = 0;
-
-	idev = __in6_dev_get(root_dev);
-	if (!idev)
-		return ret;
-
-	read_lock_bh(&idev->lock);
-	list_for_each_entry(ifa, &idev->addr_list, if_list) {
-		pr_info("updating the clip for addr %pI6\n",
-			ifa->addr.s6_addr);
-		ret = cxgb4_clip_get(dev, (const struct in6_addr *)
-				     ifa->addr.s6_addr);
-		if (ret < 0)
-			break;
-	}
-
-	read_unlock_bh(&idev->lock);
-	return ret;
-}
-
-static int update_root_dev_clip(struct net_device *dev)
-{
-	struct net_device *root_dev = NULL;
-	int i, ret = 0;
-
-	/* First populate the real net device's IPv6 address */
-	ret = update_dev_clip(dev, dev);
-	if (ret)
-		return ret;
-
-	/* Parse all bond and vlan devices layered on top of the physical dev */
-	root_dev = netdev_master_upper_dev_get(dev);
-	if (root_dev) {
-		ret = update_dev_clip(root_dev, dev);
-		if (ret)
-			return ret;
-	}
-
-	for (i = 0; i < VLAN_N_VID; i++) {
-		root_dev = __vlan_find_dev_deep_rcu(dev, htons(ETH_P_8021Q), i);
-		if (!root_dev)
-			continue;
-
-		ret = update_dev_clip(root_dev, dev);
-		if (ret)
-			break;
-	}
-	return ret;
-}
-
-static void cxgbi_update_clip(struct cxgbi_device *cdev)
-{
-	int i;
-
-	rcu_read_lock();
-
-	for (i = 0; i < cdev->nports; i++) {
-		struct net_device *dev = cdev->ports[i];
-		int ret = 0;
-
-		if (dev)
-			ret = update_root_dev_clip(dev);
-		if (ret < 0)
-			break;
-	}
-	rcu_read_unlock();
-}
-#endif /* IS_ENABLED(CONFIG_IPV6) */
-
 static void *t4_uld_add(const struct cxgb4_lld_info *lldi)
 {
 	struct cxgbi_device *cdev;
@@ -1876,10 +1753,6 @@ static int t4_uld_state_change(void *handle, enum cxgb4_state state)
 	switch (state) {
 	case CXGB4_STATE_UP:
 		pr_info("cdev 0x%p, UP.\n", cdev);
-#if IS_ENABLED(CONFIG_IPV6)
-		cxgbi_update_clip(cdev);
-#endif
-		/* re-initialize */
 		break;
 	case CXGB4_STATE_START_RECOVERY:
 		pr_info("cdev 0x%p, RECOVERY.\n", cdev);
@@ -1910,17 +1783,11 @@ static int __init cxgb4i_init_module(void)
 		return rc;
 	cxgb4_register_uld(CXGB4_ULD_ISCSI, &cxgb4i_uld_info);
 
-#if IS_ENABLED(CONFIG_IPV6)
-	register_inet6addr_notifier(&cxgbi_inet6addr_notifier);
-#endif
 	return 0;
 }
 
 static void __exit cxgb4i_exit_module(void)
 {
-#if IS_ENABLED(CONFIG_IPV6)
-	unregister_inet6addr_notifier(&cxgbi_inet6addr_notifier);
-#endif
 	cxgb4_unregister_uld(CXGB4_ULD_ISCSI);
 	cxgbi_device_unregister_all(CXGBI_FLAG_DEV_T4);
 	cxgbi_iscsi_cleanup(&cxgb4i_iscsi_transport, &cxgb4i_stt);
-- 
2.1.2

^ permalink raw reply related

* [PATCH net 0/4] ipv6 and related cleanup for cxgb4/cxgb4i
From: Anish Bhatt @ 2014-10-14 22:13 UTC (permalink / raw)
  To: netdev; +Cc: davem, hariprasad, leedom, kxie, manojmalviya, Anish Bhatt

This patch set removes some duplicated/extraneous code from cxgb4i, and guards
cxgb4 against compilation failure based on ipv6 tristate.

Anish Bhatt (4):
  cxgb4i : Remove duplicated code from cxgb4i, fucntionality already
    present in cxgb4
  cxgb4 : Fix build failure in cxgb4 when ipv6 is disabled/not in-built
  cxgb4i : All this code is only needed when IPV6 is enabled, disable
    when not required, also fixes -Wunused-function warning
  cxgb4i: Remove duplicate call to dst_neigh_lookup()

 drivers/net/ethernet/chelsio/Kconfig            |   2 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |  15 +++
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c              | 146 ++----------------------
 drivers/scsi/cxgbi/libcxgbi.c                   |   2 +
 4 files changed, 26 insertions(+), 139 deletions(-)

-- 
2.1.2

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox