Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 0/5] Candidate fix for increased number of GFP_ATOMIC failures V2
From: Karol Lewandowski @ 2009-10-28 11:42 UTC (permalink / raw)
  To: Mel LKML
  Cc: Karol Lewandowski, Mel Gorman, Frans Pop, Jiri Kosina,
	Sven Geggus, Tobias Oetiker, Rafael J. Wysocki, David Miller,
	Reinette Chatre, Kalle Valo, David Rientjes, KOSAKI Motohiro,
	Mohamed Abbas, Jens Axboe, John W. Linville, Pekka Enberg,
	Bartlomiej Zolnierkiewicz, Greg Kroah-Hartman,
	Stephan von Krawczynski, Kernel Testers List,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	"linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" <li
In-Reply-To: <9ec2d7290910240646p75b93c68v6ea1648d628a9660-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Sat, Oct 24, 2009 at 02:46:56PM +0100, Mel LKML wrote:
> Hi,

Hi,

> On 10/23/09, Karol Lewandowski <karol.k.lewandowski-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > On Fri, Oct 23, 2009 at 06:58:10PM +0200, Karol Lewandowski wrote:

> > Ok, I've tested patches 1+2+4 and bug, while very hard to trigger, is
> > still present. I'll test complete 1-4 patchset as time permits.

Sorry for silence, I've been quite busy lately.


> And also patch 5 please which is the revert. Patch 5 as pointed out is
> probably a red herring. Hwoever, it has changed the timing and made a
> difference for some testing so I'd like to know if it helps yours as
> well.

I've tested patches 1+2+3+4 in my normal usage scenario (do some work,
suspend, do work, suspend, ...) and it failed today after 4 days (== 4
suspend-resume cycles).

I'll test 1-5 now.


Thanks.

^ permalink raw reply

* Re: [net-next-2.6 PATCH] be2net:Changes to update ethtool get_settings function to return appropriate values.
From: David Miller @ 2009-10-28 11:15 UTC (permalink / raw)
  To: sarveshwarb; +Cc: netdev
In-Reply-To: <20091022132949.GA23701@serverengines.com>

From: Sarveshwar Bandi <sarveshwarb@serverengines.com>
Date: Thu, 22 Oct 2009 19:00:00 +0530

> Update ethtool get_settings function to:
> - get current link speed settings from controller
> - get port transceiver type from controller
> - fill appropriate values for supported, phy_address
> 
> Signed-off-by: Sarveshwar Bandi <sarveshwarb@serverengines.com>

Applied, thanks.

^ permalink raw reply

* Re: [net-next-2.6 PATCH] e100: Fix to allow systems with FW based cards to resume from STD
From: David Miller @ 2009-10-28 11:14 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, david.graham
In-Reply-To: <20091023025904.7057.58001.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu, 22 Oct 2009 19:59:29 -0700

> From: David Graham <david.graham@intel.com>
> 
> Devices with loadable firmware must have their firmware reloaded
> after the system resumes from sleep, but the request_firmare()
> API is not available at this point in the resume flow because
> tasks are not yet running, and the system will hang if it is
> called. Work around this issue by only calling request_firmware()
> for a device's first firmware load, and cache a copy of the pointer
> to the firmware blob for that device, so that we may reload firmware
> images even during resume.
> 
> Signed-off-by: David Graham <david.graham@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] vmxnet3: remove duplicated #include
From: David Miller @ 2009-10-28 11:13 UTC (permalink / raw)
  To: sbhatewara; +Cc: netdev, weiyi.huang, pv-drivers
In-Reply-To: <alpine.LRH.2.00.0910221634130.23769@sbhatewara-dev1.eng.vmware.com>

From: Shreyas Bhatewara <sbhatewara@vmware.com>
Date: Thu, 22 Oct 2009 16:58:33 -0700 (PDT)

> 
> 
> Remove duplicate headerfile includes from vmxnet3_int.h
> 
> Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
> Signed-off-by: Shreyas Bhatewara <sbhatewara@vmware.com>
> Signed-off-by: Bhavesh Davda <davda@vmware.com>

This patch doesn't apply to net-next-2.6, please resend.

^ permalink raw reply

* Re: [PATCH NEXT 0/6] netxen: changes for new chip
From: David Miller @ 2009-10-28 11:11 UTC (permalink / raw)
  To: dhananjay; +Cc: netdev
In-Reply-To: <1256436243-5736-1-git-send-email-dhananjay@netxen.com>

From: Dhananjay Phadke <dhananjay@netxen.com>
Date: Sat, 24 Oct 2009 19:03:57 -0700

> Series of 6 patches for net-next-2.6, please apply.

All applied, thanks.

^ permalink raw reply

* Re: [PATCH] virtio-net: fix data corruption with OOM
From: David Miller @ 2009-10-28 11:03 UTC (permalink / raw)
  To: rusty; +Cc: netdev, mst
In-Reply-To: <200910282126.58902.rusty@rustcorp.com.au>

From: Rusty Russell <rusty@rustcorp.com.au>
Date: Wed, 28 Oct 2009 21:26:58 +1030

> On Tue, 27 Oct 2009 11:57:20 am you wrote:
>> Anything in a reply to a patch that looks like a signoff or ACK,
>> patchwork adds to the commit message in the mbox blob it spits out for
>> me.
> 
> In case this got lost in the meta-discussion:

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] net: Corrected spelling error heurestics->heuristics
From: David Miller @ 2009-10-28 11:02 UTC (permalink / raw)
  To: apetlund; +Cc: netdev, trivial, linux-kernel, ilpo.jarvinen
In-Reply-To: <4AE6F539.1020107@simula.no>

From: Andreas Petlund <apetlund@simula.no>
Date: Tue, 27 Oct 2009 14:27:21 +0100

> Corrected a spelling error in a function name.
> 
> Signed-off-by: Andreas Petlund <apetlund@simula.no>

Applied to net-next-2.6, thanks.

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: sysfs: ethtool_ops can be NULL
From: David Miller @ 2009-10-28 11:02 UTC (permalink / raw)
  To: andy; +Cc: eric.dumazet, netdev
In-Reply-To: <20091026134033.GD1639@gospo.rdu.redhat.com>

From: Andy Gospodarek <andy@greyhouse.net>
Date: Mon, 26 Oct 2009 09:40:33 -0400

> On Mon, Oct 26, 2009 at 12:23:33PM +0100, Eric Dumazet wrote:
>> commit d519e17e2d01a0ee9abe083019532061b4438065
>> (net: export device speed and duplex via sysfs)
>> made the wrong assumption that netdev->ethtool_ops was always set.
>> 
>> This makes possible to crash kernel and let rtnl in locked state.
>> 
>> modprobe dummy
>> ip link set dummy0 up
>> (udev runs and crash)
>> 
>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
 ...
> Nice catch, Eric.
> 
> Acked-by: Andy Gospodarek <andy@greyhouse.net>

Applied.

^ permalink raw reply

* Re: [PATCH] via-velocity: Remove private device list
From: David Miller @ 2009-10-28 11:02 UTC (permalink / raw)
  To: ben; +Cc: romieu, netdev
In-Reply-To: <1256501329.3136.109.camel@localhost>

From: Ben Hutchings <ben@decadent.org.uk>
Date: Sun, 25 Oct 2009 20:08:49 +0000

> via-velocity maintains a list of its devices in order to determine
> whether a netdev notification applies to one of them.  That can be
> determined simply by checking the netdev_ops pointer, so the list can
> be removed.
> 
> Compile-tested only.
> 
> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

Looks good to me, applied to net-next-2.6

^ permalink raw reply

* Re: [PATCH V2]NET/KS8695: add support NAPI for Rx
From: Ben Dooks @ 2009-10-28 10:57 UTC (permalink / raw)
  To: Figo.zhang; +Cc: David S. Miller, netdev
In-Reply-To: <1256653422.2148.23.camel@myhost>

Figo.zhang wrote:
> Add support NAPI Rx API for KS8695NET driver.
> 
> v2, change the Rx function to NAPI.
> 
> in <KS8695X Integrated Multi-port Gateway Solution Register Description
>  v1.0>:
> 
> Interrupt Enable Register (offset 0xE204)
> Bit29 : WAN MAC Receive Interrupt Enable
> Bit16 : LAN MAC Receive Interrupt Enable
> 
> Interrupt Status Register (Offset 0xF208)
> Bit29: WAN MAC Receive Status
> Bit16: LAN MAC Receive Status
> 
> see arch/arm/mach-ks8695/devices.c:
> ks8695_wan_resources[] and ks8695_lan_resources[]
> have IORESOURCE_IRQ , it have define the RX irq,
> for wan, irq = 29; for lan ,irq = 16.
> so we can do this read the interrupt status:
> 
> unsigned long mask_bit = 1 << ksp->rx_irq;
> status = readl(KS8695_IRQ_VA + KS8695_INTST);

It would be nice to see some form of API addition to
the interrupt system to ack interrupts that have been
handled like this, since the irq layer is already
tracking the necessary IRQ->handler mappings.

-- 
Ben Dooks, Software Engineer, Simtec Electronics

http://www.simtec.co.uk/

^ permalink raw reply

* Re: [PATCH 2/2] tc35815: Enable NAPI
From: David Miller @ 2009-10-28 10:57 UTC (permalink / raw)
  To: anemo; +Cc: netdev, ralf.roesch
In-Reply-To: <1256564782-2781-2-git-send-email-anemo@mba.ocn.ne.jp>

From: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Date: Mon, 26 Oct 2009 22:46:22 +0900

> This driver has NAPI code but it has been disabled.  Enable it now.
> The non-napi code will be removed lator.
> 
> Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>

Applied.

Please remove the NAPI enabling macro and the tests for it.
NAPI support should be unconditional.

If people want to test the pre-NAPI behavior, they can check
out an older copy of the driver quite easily.

Thanks.

^ permalink raw reply

* Re: [PATCH] virtio-net: fix data corruption with OOM
From: Rusty Russell @ 2009-10-28 10:56 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, mst
In-Reply-To: <20091026.182720.81248604.davem@davemloft.net>

On Tue, 27 Oct 2009 11:57:20 am you wrote:
> Anything in a reply to a patch that looks like a signoff or ACK,
> patchwork adds to the commit message in the mbox blob it spits out for
> me.

In case this got lost in the meta-discussion:

Subject: virtio-net: fix data corruption with OOM
Date: Sun, 25 Oct 2009 19:03:40 +0200
From: "Michael S. Tsirkin" <mst@redhat.com>

virtio net used to unlink skbs from send queues on error,
but ever since 48925e372f04f5e35fec6269127c62b2c71ab794
we do not do this. This causes guest data corruption and crashes
with vhost since net core can requeue the skb or free it without
it being taken off the list.

This patch fixes this by queueing the skb after successful
transmit.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (+ comment)
---
 drivers/net/virtio_net.c |   14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -516,8 +516,7 @@ again:
 	/* Free up any pending old buffers before queueing new ones. */
 	free_old_xmit_skbs(vi);
 
-	/* Put new one in send queue and do transmit */
-	__skb_queue_head(&vi->send, skb);
+	/* Try to transmit */
 	capacity = xmit_skb(vi, skb);
 
 	/* This can happen with OOM and indirect buffers. */
@@ -531,8 +530,17 @@ again:
 		}
 		return NETDEV_TX_BUSY;
 	}
+	vi->svq->vq_ops->kick(vi->svq);
 
-	vi->svq->vq_ops->kick(vi->svq);
+	/*
+	 * Put new one in send queue.  You'd expect we'd need this before
+	 * xmit_skb calls add_buf(), since the callback can be triggered
+	 * immediately after that.  But since the callback just triggers
+	 * another call back here, normal network xmit locking prevents the
+	 * race.
+	 */
+	__skb_queue_head(&vi->send, skb);
+
 	/* Don't wait up for transmitted skbs to be freed. */
 	skb_orphan(skb);
 	nf_reset(skb);


^ permalink raw reply

* Re: [PATCH 1/2] tc35815: Fix return value of tc35815_do_interrupt when NAPI enabled
From: David Miller @ 2009-10-28 10:57 UTC (permalink / raw)
  To: anemo; +Cc: netdev, ralf.roesch
In-Reply-To: <1256564782-2781-1-git-send-email-anemo@mba.ocn.ne.jp>

From: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Date: Mon, 26 Oct 2009 22:46:21 +0900

> Return received count correctly even if tx completed at the same time.
> Currently NAPI is disabled for this driver so this patch does not fix
> any real problem.
> 
> Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>

Applied.

^ permalink raw reply

* Re: [PATCH V2]NET/KS8695: add support NAPI for Rx
From: David Miller @ 2009-10-28 10:55 UTC (permalink / raw)
  To: figo1802; +Cc: dsilvers, netdev, ben
In-Reply-To: <1256653422.2148.23.camel@myhost>

From: "Figo.zhang" <figo1802@gmail.com>
Date: Tue, 27 Oct 2009 22:23:42 +0800

> Add support NAPI Rx API for KS8695NET driver.
> 
> v2, change the Rx function to NAPI.
 ...
> Signed-off-by: Figo.zhang <figo1802@gmail.com>

Applied to net-next-2.6, thanks.

^ permalink raw reply

* Re: [PATCH net-2.6] sfc: Really allow RX checksum offload to be disabled
From: Ben Hutchings @ 2009-10-28 10:53 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <20091028.024940.181264224.davem@davemloft.net>

On Wed, 2009-10-28 at 02:49 -0700, David Miller wrote:
> From: Ben Hutchings <bhutchings@solarflare.com>
> Date: Tue, 27 Oct 2009 19:44:33 +0000
> 
> > We have never checked the efx_nic::rx_checksum_enabled flag everywhere
> > we should, and since the switch to GRO we don't check it anywhere.
> > It's simplest to check it in the one place where we initialise the
> > per-packet checksummed flag.
> > 
> > Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> > Cc: stable@kernel.org
> > ---
> > I'm not sure whether this is serious enough to merit a stable update.
> > It's not a recent regression.
> 
> This patch only applies to net-next-2.6, so I can't see how it could
> be a -stable candidate :-)
> 
> So I've applied it there.

The register name update in net-next-2.6 changed the context for this
patch.  I'll send a new patch that will apply to the earlier versions.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH net-2.6] sfc: Set ip_summed correctly for page buffers passed to GRO
From: David Miller @ 2009-10-28 10:44 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1256655057.2794.4.camel@achroite>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Tue, 27 Oct 2009 14:50:57 +0000

> Page buffers containing packets with an incorrect checksum or using a
> protocol not handled by hardware checksum offload were previously not
> passed to LRO.  The conversion to GRO changed this, but did not set
> the ip_summed value accordingly.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

> This affects 2.6.31 and seems like a candidate for a stable update.

Queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH] cnic: Fix L2CTX_STATUSB_NUM offset in context memory.
From: David Miller @ 2009-10-28 10:42 UTC (permalink / raw)
  To: mchan; +Cc: davem, netdev, benli
In-Reply-To: <1256662728-21864-1-git-send-email-mchan@broadcom.com>

From: "Michael Chan" <mchan@broadcom.com>
Date: Tue, 27 Oct 2009 08:58:48 -0800

> The BNX2_L2CTX_STATUSB_NUM definition needs to be changed to match
> the recent firmware update:
> 
> commit 078b0735881c7969aaf21469f3577831cddd9f8c
> bnx2: Update firmware to 5.0.0.j3.
> 
> Without the fix, bnx2 can crash intermittently in bnx2_rx_int() when
> iSCSI is enabled.
> 
> Signed-off-by: Michael Chan <mchan@broadcom.com>
> Signed-off-by: Benjamin Li <benli@broadcom.com>

Applied to net-2.6, but please be explicit about what tree you
want me to apply this to in the future.

Sure I could deduce this by running "git describe" on that
commit ID mentioned in the commit message, but why not be
explicit? :-)

^ permalink raw reply

* Re: [net-next-2.6 PATCH 01/23] igb: add support for seperate tx-usecs setting in ethtool
From: David Miller @ 2009-10-28 10:39 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, alexander.h.duyck
In-Reply-To: <20091028094540.13156.2637.stgit@localhost.localdomain>


All applied to net-next-2.6, but then I had to add the following
patch to kill a warning:

igb: Fix warnings in igb_set_ringparam()

drivers/net/igb/igb_ethtool.c: In function ‘igb_set_ringparam’:
drivers/net/igb/igb_ethtool.c:744: warning: comparison of distinct pointer types lacks a cast
drivers/net/igb/igb_ethtool.c:748: warning: comparison of distinct pointer types lacks a cast

Casts were to u16 on the constant, but the type of new_{r,t}x_count is
u32.  Cast to u32 instead.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/igb/igb_ethtool.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/igb/igb_ethtool.c b/drivers/net/igb/igb_ethtool.c
index d24b902..90b89a8 100644
--- a/drivers/net/igb/igb_ethtool.c
+++ b/drivers/net/igb/igb_ethtool.c
@@ -741,11 +741,11 @@ static int igb_set_ringparam(struct net_device *netdev,
 		return -EINVAL;
 
 	new_rx_count = min(ring->rx_pending, (u32)IGB_MAX_RXD);
-	new_rx_count = max(new_rx_count, (u16)IGB_MIN_RXD);
+	new_rx_count = max(new_rx_count, (u32)IGB_MIN_RXD);
 	new_rx_count = ALIGN(new_rx_count, REQ_RX_DESCRIPTOR_MULTIPLE);
 
 	new_tx_count = min(ring->tx_pending, (u32)IGB_MAX_TXD);
-	new_tx_count = max(new_tx_count, (u16)IGB_MIN_TXD);
+	new_tx_count = max(new_tx_count, (u32)IGB_MIN_TXD);
 	new_tx_count = ALIGN(new_tx_count, REQ_TX_DESCRIPTOR_MULTIPLE);
 
 	if ((new_tx_count == adapter->tx_ring_count) &&
-- 
1.6.5.1


^ permalink raw reply related

* Re: [PATCH] Multicast packet reassembly can fail
From: Eric Dumazet @ 2009-10-28 10:18 UTC (permalink / raw)
  To: Steve Chen; +Cc: netdev
In-Reply-To: <1256683583.3153.389.camel@linux-1lbu>

Steve Chen a écrit :
> Multicast packet reassembly can fail
> 
> When multicast connections with multiple fragments are received by the same
> node from more than one Ethernet ports, race condition between fragments
> from each Ethernet port can cause fragment reassembly to fail leading to
> packet drop.  This is because packets from each Ethernet port appears identical
> to the the code that reassembles the Ethernet packet.
> 
> The solution is evaluate the Ethernet interface number in addition to all other
> parameters so that every packet can be uniquely identified.  The existing
> iif field in struct ipq is now used to generate the hash key, and iif is also
> used for comparison in case of hash collision.
> 
> Please note that q->saddr ^ (q->iif << 5) is now being passed into
> ipqhashfn to generate the hash key.  This is borrowed from the routing
> code.
> 
> Signed-off-by: Steve Chen <schen@mvista.com>
> Signed-off-by: Mark Huth <mhuth@mvista.com>
> 

This makes no sense to me, but I need to check the code.

How interface could matter in IP defragmentation ?
And why multicast is part of the equation ?

If defrag fails, this must be for other reason,
and probably needs another fix.

Check line 219 of net/ipv4/inet_fragment.c

#ifdef CONFIG_SMP
        /* With SMP race we have to recheck hash table, because
         * such entry could be created on other cpu, while we
         * promoted read lock to write lock.
         */
        hlist_for_each_entry(qp, n, &f->hash[hash], list) {
                if (qp->net == nf && f->match(qp, arg)) {
                        atomic_inc(&qp->refcnt);
                        write_unlock(&f->lock);
                        qp_in->last_in |= INET_FRAG_COMPLETE;   <<< HERE >>>
                        inet_frag_put(qp_in, f);
                        return qp;
                }
        }
#endif

I really wonder why we set INET_FRAG_COMPLETE here

^ permalink raw reply

* Re: [PATCH v3 4/7] Add the no SACK route option feature
From: Gilad Ben-Yossef @ 2009-10-28 10:18 UTC (permalink / raw)
  To: William Allen Simpson; +Cc: netdev, ori
In-Reply-To: <4AE5D089.2050606@gmail.com>

William Allen Simpson wrote:

> Gilad Ben-Yossef wrote:
>> Implement querying and acting upon the no sack bit in the features
>> field.
>>
>> Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
>> Sigend-off-by: Ori Finkelman <ori@comsleep.com>
>> Sigend-off-by: Yony Amit <yony@comsleep.com>
>>
> Please explain how this code turns SACK on when it is off globally?
>
> As both Eric and I asked?
It doesn't. Please see my discussion with Eric for the why.

In short, doing so introduce a very subtle change to what the existing 
interface do today, which will break
backwards compatibility by changing the meaning of writing zero to the 
relevant sysctl. I don't want to be hunt down by angry sys admins :-)

Thanks,
Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.

Web:   http://codefidence.com
Cell:  +972-52-8260388
Skype: gilad_codefidence
Tel:   +972-8-9316883 ext. 201
Fax:   +972-8-9316884
Email: gilad@codefidence.com

Check out our Open Source technology and training blog - http://tuxology.net

	"The biggest risk you can take it is to take no risk."
		-- Mark Zuckerberg and probably others

^ permalink raw reply

* Re: [PATCH v3 1/7] Only parse time stamp TCP option in time wait sock
From: Gilad Ben-Yossef @ 2009-10-28 10:14 UTC (permalink / raw)
  To: William Allen Simpson; +Cc: netdev, ori, Yony Amit
In-Reply-To: <4AE5D4AE.2080108@gmail.com>

Hi William,

William Allen Simpson wrote:

> Gilad Ben-Yossef wrote:
>> Since we only use tcp_parse_options here to check for the exietence
>> of TCP timestamp option in the header, it is better to call with
>> the "established" flag on.
>>
> Please explain how this patch is required for the other patches?
Gladly (and suggestions to do it differently are welcome) :

For the purpose of the patch tcp_parse_options was changed to consult 
dst_entry options when parsing non established packets.

This means that for any place that we call tcp_parse_options with the 
established parameter set to false, we need to supply it with a dst_entry.

In all other locations in kernel code when tcp_parse_options is called 
such a dst_entry is easily available already.

The time wait mini socket exists so that we would not waste resource 
keeping around the full socket state of a "real socket". As such, it 
does not cache the dst_entry. Adding it to the TIME_WAIT mini sockets 
jsut for this purpose defeats the purpose of having a mini socket in the 
first place.

One other possible way to go about it is to re-compute the dst_entry at 
this location, but this seemed an expensive operation to perform for 
what should be a light weight operation. I asked myself if there might 
be another way?

So I took a good look at the code and discovered that there is no need 
to call tcp_parse_options there in "non established" mode at all.
>
> And more importantly, why it is better to call with established on?
Sure. This is kind of long written down, although it's really simple. I 
will try to describe it as best I can.

Take a look at what tcp_parse_options() does as a function -

It has only one output: changing the fields of the tcp_options_received 
struct  which it gets a pointer to as a parameter. It also has a single 
side effect: it updates the  SKB TCP control block sacked field, if a 
SACK option is detected in the packet header.

Its behavior is dictated by the established parameter. If false, it will 
try to parse all supported TCP options, if found in the packet header. 
If true it will only try to parse the time stamp and SACK options.

Now take a look what happens at tcp_timewait_state_process() when we 
call tcp_parse_options() -

We allocate (on stack) a temporary tcp_options_received struct, and if 
our TIME_WAIT mini socket had recent timestamp data 
(tcptw->tw_ts_recent_stamp), we call tcp_parse_options() with our 
temporay tcp_options_received struct.

Here is the important bit:  we never ever look at anything in the 
tcp_options_received struct after the call returns, except for the time 
stamp data if it is available!

So, passing established as false here makes us try to parse, if found in 
the packet, a bunch of options that we never ever look at the result of. 
(The fact that time wait minisocket  code also zeros the saw_tstamp 
before the call to tcp_parse_options although the same field is being 
zeroed again inside the function is just icing on the cake...)

I have one more issue to explain, and this is regarding the single side 
effect tcp_parse_option has - if the SACK option is found, 
tcp_parse_options updates the skb control block sacked field. However, 
not that it does this regardless of whether established is true or 
false, so it doesn't matter which we pass. (I will leave out the fact 
that whether or not the SACK option is parsed depends on a non 
initialized field of the tcp_options_received struct now as an obscure 
detail... nothing obviously looks at that later).

So bottom line: passing a true value in established does the exact same 
thing, result wise, as current code, except it does so in fewer cycles.

I do confess to having goofed here in one regard: the patch I posted did 
not set the tstamp_ok field of the tcp_options_received struct, which 
can lead to randomly not parsing the time stamp option even when you 
need to.

Perhaps this is what masks my intent. This is a bug of course and I'm 
grateful for you for helping me catch it :-)

I will send an updated patch set with this fixed ASAP.

> And most importantly, what end cases you considered, and how this
> interacts with the proposed rfc1323bis changes, especially on reset?
>
My whole point was that this "change" does not change the behavior of 
the code in any way. In fact, it is no different then a compile time 
optimization (don't execute code paths nothing later uses the result 
thereof) and if the compiler was smart enough, it would have done the 
same. So corner cases and RFC compliance stay exactly as before.

I hope I managed to explain myself better this time around and thanks 
again for taking the time to review this. ;-)

Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.

Web:   http://codefidence.com
Cell:  +972-52-8260388
Skype: gilad_codefidence
Tel:   +972-8-9316883 ext. 201
Fax:   +972-8-9316884
Email: gilad@codefidence.com

Check out our Open Source technology and training blog - http://tuxology.net

	"The biggest risk you can take it is to take no risk."
		-- Mark Zuckerberg and probably others

^ permalink raw reply

* Re: [PATCH net-next] bnx2x: Do Tx handling in a separate tasklet.
From: David Miller @ 2009-10-28  9:57 UTC (permalink / raw)
  To: vladz; +Cc: IMCEAMAILTO-davem+40davemloft+2Enet, eilong, netdev
In-Reply-To: <8628FE4E7912BF47A96AE7DD7BAC0AADCB2CFF2028@SJEXCHCCR02.corp.ad.broadcom.com>

From: "Vladislav Zolotarov" <vladz@broadcom.com>
Date: Wed, 28 Oct 2009 02:54:37 -0700

> I'd like to start from your last remark: you r absolutely right, and this is the problem we have in the current net-next driver. More than that, this patch is fixing this problem: it moved liberation of Tx SKBs from hardIRQ context (ISR) to the softIRQ context (tasklet) thereby resolving the problem u've mentioned. So, total agreement with u on this one. I must have named the patch differently to emphasize it.
> 
> I'd like to summarize the patch I've sent:
> - Take Tx SKB liberation out of hardIRQ.
> - Instead schedule a DPC that handles Tx work.
> - Optimize the access to status block indices: read only the index we are about to use in the current context.
> 
> So, could u, pls., apply the patch in order to fix the problem we currently have in bnx2x?

There is no reason not to use NAPI to achieve this objective and that's
the main objection I have to your patch.

Using NAPI will not only allow you to move the SKB freeing to softirq
context but it will also provide fairness between multiple NAPI
contexts active at the same time on the same cpu.

Furthermore, if you combine RX and TX NAPI work for a specific queue
into the same NAPI context, TX liberation can run first and provide
fresh CPU local SKBs for RX packet input processing created replies
to allocate.

You haven't addressed any of that, and I am not going to apply your
patch becuase I don't want your driver to set a precedence here.

^ permalink raw reply

* [net-next-2.6 PATCH 23/23] igb: cleanup whitespace issues in igb_main.c
From: Jeff Kirsher @ 2009-10-28  9:52 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Alexander Duyck, Jeff Kirsher
In-Reply-To: <20091028094540.13156.2637.stgit@localhost.localdomain>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This patch goes through and cleans up whitespace issues in igb_main.c
to help improve readability.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/igb/igb_main.c |   41 +++++++++++++++++++----------------------
 1 files changed, 19 insertions(+), 22 deletions(-)

diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index 1a6c074..b044c98 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -1006,7 +1006,6 @@ static void igb_release_hw_control(struct igb_adapter *adapter)
 			ctrl_ext & ~E1000_CTRL_EXT_DRV_LOAD);
 }
 
-
 /**
  * igb_get_hw_control - get control of the h/w from f/w
  * @adapter: address of board private structure
@@ -1067,7 +1066,6 @@ static void igb_configure(struct igb_adapter *adapter)
  * igb_up - Open the interface and prepare it to handle traffic
  * @adapter: board private structure
  **/
-
 int igb_up(struct igb_adapter *adapter)
 {
 	struct e1000_hw *hw = &adapter->hw;
@@ -1288,7 +1286,7 @@ void igb_reset(struct igb_adapter *adapter)
 }
 
 static const struct net_device_ops igb_netdev_ops = {
-	.ndo_open 		= igb_open,
+	.ndo_open		= igb_open,
 	.ndo_stop		= igb_close,
 	.ndo_start_xmit		= igb_xmit_frame_adv,
 	.ndo_get_stats		= igb_get_stats,
@@ -1444,7 +1442,6 @@ static int __devinit igb_probe(struct pci_dev *pdev,
 	netdev->features |= NETIF_F_IPV6_CSUM;
 	netdev->features |= NETIF_F_TSO;
 	netdev->features |= NETIF_F_TSO6;
-
 	netdev->features |= NETIF_F_GRO;
 
 	netdev->vlan_features |= NETIF_F_TSO;
@@ -1569,7 +1566,6 @@ static int __devinit igb_probe(struct pci_dev *pdev,
 	}
 
 #endif
-
 	switch (hw->mac.type) {
 	case e1000_82576:
 		/*
@@ -1624,8 +1620,8 @@ static int __devinit igb_probe(struct pci_dev *pdev,
 	/* print bus type/speed/width info */
 	dev_info(&pdev->dev, "%s: (PCIe:%s:%s) %pM\n",
 		 netdev->name,
-		 ((hw->bus.speed == e1000_bus_speed_2500)
-		  ? "2.5Gb/s" : "unknown"),
+		 ((hw->bus.speed == e1000_bus_speed_2500) ? "2.5Gb/s" :
+		                                            "unknown"),
 		 ((hw->bus.width == e1000_bus_width_pcie_x4) ? "Width x4" :
 		  (hw->bus.width == e1000_bus_width_pcie_x2) ? "Width x2" :
 		  (hw->bus.width == e1000_bus_width_pcie_x1) ? "Width x1" :
@@ -1658,8 +1654,8 @@ err_sw_init:
 err_ioremap:
 	free_netdev(netdev);
 err_alloc_etherdev:
-	pci_release_selected_regions(pdev, pci_select_bars(pdev,
-	                             IORESOURCE_MEM));
+	pci_release_selected_regions(pdev,
+	                             pci_select_bars(pdev, IORESOURCE_MEM));
 err_pci_reg:
 err_dma:
 	pci_disable_device(pdev);
@@ -1723,11 +1719,12 @@ static void __devexit igb_remove(struct pci_dev *pdev)
 		dev_info(&pdev->dev, "IOV Disabled\n");
 	}
 #endif
+
 	iounmap(hw->hw_addr);
 	if (hw->flash_address)
 		iounmap(hw->flash_address);
-	pci_release_selected_regions(pdev, pci_select_bars(pdev,
-	                             IORESOURCE_MEM));
+	pci_release_selected_regions(pdev,
+	                             pci_select_bars(pdev, IORESOURCE_MEM));
 
 	free_netdev(netdev);
 
@@ -2288,9 +2285,7 @@ void igb_setup_rctl(struct igb_adapter *adapter)
 	 */
 	rctl |= E1000_RCTL_SECRC;
 
-	/*
-	 * disable store bad packets and clear size bits.
-	 */
+	/* disable store bad packets and clear size bits. */
 	rctl &= ~(E1000_RCTL_SBP | E1000_RCTL_SZ_256);
 
 	/* enable LPE to prevent packets larger than max_frame_size */
@@ -2916,7 +2911,8 @@ static void igb_watchdog(unsigned long data)
 static void igb_watchdog_task(struct work_struct *work)
 {
 	struct igb_adapter *adapter = container_of(work,
-					struct igb_adapter, watchdog_task);
+	                                           struct igb_adapter,
+                                                   watchdog_task);
 	struct e1000_hw *hw = &adapter->hw;
 	struct net_device *netdev = adapter->netdev;
 	struct igb_ring *tx_ring = adapter->tx_ring;
@@ -2935,14 +2931,14 @@ static void igb_watchdog_task(struct work_struct *work)
 			/* Links status message must follow this format */
 			printk(KERN_INFO "igb: %s NIC Link is Up %d Mbps %s, "
 				 "Flow Control: %s\n",
-			         netdev->name,
-				 adapter->link_speed,
-				 adapter->link_duplex == FULL_DUPLEX ?
+			       netdev->name,
+			       adapter->link_speed,
+			       adapter->link_duplex == FULL_DUPLEX ?
 				 "Full Duplex" : "Half Duplex",
-				 ((ctrl & E1000_CTRL_TFCE) && (ctrl &
-				 E1000_CTRL_RFCE)) ? "RX/TX" : ((ctrl &
-				 E1000_CTRL_RFCE) ? "RX" : ((ctrl &
-				 E1000_CTRL_TFCE) ? "TX" : "None")));
+			       ((ctrl & E1000_CTRL_TFCE) &&
+			        (ctrl & E1000_CTRL_RFCE)) ? "RX/TX" :
+			       ((ctrl & E1000_CTRL_RFCE) ?  "RX" :
+			       ((ctrl & E1000_CTRL_TFCE) ?  "TX" : "None")));
 
 			/* tweak tx_queue_len according to speed/duplex and
 			 * adjust the timeout factor */
@@ -3724,6 +3720,7 @@ static int igb_change_mtu(struct net_device *netdev, int new_mtu)
 
 	/* igb_down has a dependency on max_frame_size */
 	adapter->max_frame_size = max_frame;
+
 	/* NOTE: netdev_alloc_skb reserves 16 bytes, and typically NET_IP_ALIGN
 	 * means we reserve 2 more, this pushes us to allocate from the next
 	 * larger slab size.


^ permalink raw reply related

* [net-next-2.6 PATCH 22/23] igb: open up SCTP checksum offloads to all MACs 82576 and newer
From: Jeff Kirsher @ 2009-10-28  9:52 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Alexander Duyck, Jeff Kirsher
In-Reply-To: <20091028094540.13156.2637.stgit@localhost.localdomain>

From: Alexander Duyck <alexander.h.duyck@intel.com>

Going forward the plan is to have the MACs support SCTP checksum offloads
so change the check from == to >=.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/igb/igb_main.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index 846e64f..1a6c074 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -1456,7 +1456,7 @@ static int __devinit igb_probe(struct pci_dev *pdev,
 	if (pci_using_dac)
 		netdev->features |= NETIF_F_HIGHDMA;
 
-	if (adapter->hw.mac.type == e1000_82576)
+	if (hw->mac.type >= e1000_82576)
 		netdev->features |= NETIF_F_SCTP_CSUM;
 
 	adapter->en_mng_pt = igb_enable_mng_pass_thru(hw);


^ permalink raw reply related

* [net-next-2.6 PATCH 21/23] igb: limit minimum mtu to 68 to keep ip bound to interface
From: Jeff Kirsher @ 2009-10-28  9:52 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Alexander Duyck, Jeff Kirsher
In-Reply-To: <20091028094540.13156.2637.stgit@localhost.localdomain>

From: Alexander Duyck <alexander.h.duyck@intel.com>

Limit the minimum mtu to 68 in order to prevent ip from being unbound from
the interface.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/igb/igb_main.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index 264ff00..846e64f 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -3709,8 +3709,7 @@ static int igb_change_mtu(struct net_device *netdev, int new_mtu)
 	int max_frame = new_mtu + ETH_HLEN + ETH_FCS_LEN;
 	u32 rx_buffer_len, i;
 
-	if ((max_frame < ETH_ZLEN + ETH_FCS_LEN) ||
-	    (max_frame > MAX_JUMBO_FRAME_SIZE)) {
+	if ((new_mtu < 68) || (max_frame > MAX_JUMBO_FRAME_SIZE)) {
 		dev_err(&pdev->dev, "Invalid MTU setting\n");
 		return -EINVAL;
 	}


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox