Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] MAINTAINERS: reflect actual changes in IEEE 802.15.4 maintainership
From: Dmitry Eremin-Solenikov @ 2012-07-14  6:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: netdev, David S. Miller, Dmitry Eremin-Solenikov,
	Alexander Smirnov

As the life flows, developers priorities shifts a bit. Reflect actual
changes in the maintainership of IEEE 802.15.4 code: Sergey mostly
stopped cared about this piece of code. Most of the work recently was
done by Alexander, so put him to the MAINTAINERS file to reflect his
status and to ease the life of respective patches.

Also add new net/mac802154/ directory to the list of maintained files.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
---
 MAINTAINERS |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 150a29f..f03c703 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3403,13 +3403,14 @@ S:	Supported
 F:	drivers/idle/i7300_idle.c
 
 IEEE 802.15.4 SUBSYSTEM
+M:	Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
 M:	Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
-M:	Sergey Lapin <slapin@ossfans.org>
 L:	linux-zigbee-devel@lists.sourceforge.net (moderated for non-subscribers)
 W:	http://apps.sourceforge.net/trac/linux-zigbee
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/lowpan/lowpan.git
 S:	Maintained
 F:	net/ieee802154/
+F:	net/mac802154/
 F:	drivers/ieee802154/
 
 IIO SUBSYSTEM AND DRIVERS
-- 
1.7.10.4

^ permalink raw reply related

* Re: [RFC 1/2] PCI-Express Non-Transparent Bridge Support
From: Jon Mason @ 2012-07-14  6:19 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: linux-kernel, netdev, linux-pci, Dave Jiang
In-Reply-To: <20120713171344.1066d3b1@nehalam.linuxnetplumber.net>

On Fri, Jul 13, 2012 at 05:13:44PM -0700, Stephen Hemminger wrote:
> On Fri, 13 Jul 2012 14:44:59 -0700
> Jon Mason <jon.mason@intel.com> wrote:
> 
> > A PCI-Express non-transparent bridge (NTB) is a point-to-point PCIe bus
> > connecting 2 systems, providing electrical isolation between the two subsystems.
> > A non-transparent bridge is functionally similar to a transparent bridge except
> > that both sides of the bridge have their own independent address domains.  The
> > host on one side of the bridge will not have the visibility of the complete
> > memory or I/O space on the other side of the bridge.  To communicate across the
> > non-transparent bridge, each NTB endpoint has one (or more) apertures exposed to
> > the local system.  Writes to these apertures are mirrored to memory on the
> > remote system.  Communications can also occur through the use of doorbell
> > registers that initiate interrupts to the alternate domain, and scratch-pad
> > registers accessible from both sides.
> > 
> > The NTB device driver is needed to configure these memory windows, doorbell, and
> > scratch-pad registers as well as use them in such a way as they can be turned
> > into a viable communication channel to the remote system.  ntb_hw.[ch]
> > determines the usage model (NTB to NTB or NTB to Root Port) and abstracts away
> > the underlying hardware to provide access and a common interface to the doorbell
> > registers, scratch pads, and memory windows.  These hardware interfaces are
> > exported so that other, non-mainlined kernel drivers can access these.
> > ntb_transport.[ch] also uses the exported interfaces in ntb_hw.[ch] to setup a
> > communication channel(s) and provide a reliable way of transferring data from
> > one side to the other, which it then exports so that "client" drivers can access
> > them.  These client drivers are used to provide a standard kernel interface
> > (i.e., Ethernet device) to NTB, such that Linux can transfer data from one
> > system to the other in a standard way.
> > 
> > Signed-off-by: Jon Mason <jon.mason@intel.com>
> 
> This driver does some reimplementing of standard type operations is this
> because you are trying to use the same code on multiple platforms?
> 
> Example:
> +
> +static void ntb_list_add_head(spinlock_t *lock, struct list_head *entry,
> +			      struct list_head *list)
> +{
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(lock, flags);
> +	list_add(entry, list);
> +	spin_unlock_irqrestore(lock, flags);
> +}
> +
> +static void ntb_list_add_tail(spinlock_t *lock, struct list_head *entry,
> +			      struct list_head *list)
> +{
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(lock, flags);
> +	list_add_tail(entry, list);
> +	spin_unlock_irqrestore(lock, flags);
> +}
> 
> Which are used on skb's and yet we already have sk_buff_head with locking?
> 
> I know you probably are committed to this API, but is there some way to
> reuse existing shared memory used by virtio-net between two ports?
> 
> 
The intention is to be able to have multiple client drivers/virtual devices that are able to use NTB as the transport to the remote system.  This is the reason why a void* is passed into the transport instead of skb*, making all of the extra book keeping necessary.  Currently, only the virtual Ethernet has been done, which may be part of the confusion.  I'd like to be able to find a way to have the virtio devices use ntb (and save me the work of reinventing the wheel), but step one is getting this code accepted :)

Thanks,
Jon

^ permalink raw reply

* [net 0/2][pull request] Intel Wired LAN Driver Updates
From: Jeff Kirsher @ 2012-07-14  7:47 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, gospo, sassmann

This series contains fixes to e1000e.

The following are changes since commit 7ac2908e4b2edaec60e9090ddb4d9ceb76c05e7d:
  sch_sfb: Fix missing NULL check
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net master

Bruce Allan (1):
  e1000e: fix test for PHY being accessible on 82577/8/9 and I217

Tushar Dave (1):
  e1000e: Correct link check logic for 82571 serdes

 drivers/net/ethernet/intel/e1000e/82571.c   |    3 ++
 drivers/net/ethernet/intel/e1000e/ich8lan.c |   42 ++++++++++++++++++++-------
 2 files changed, 35 insertions(+), 10 deletions(-)

-- 
1.7.10.4

^ permalink raw reply

* [net 2/2] e1000e: fix test for PHY being accessible on 82577/8/9 and I217
From: Jeff Kirsher @ 2012-07-14  7:47 UTC (permalink / raw)
  To: davem; +Cc: Bruce Allan, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1342252063-27023-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Bruce Allan <bruce.w.allan@intel.com>

Occasionally, the PHY can be initially inaccessible when the first read of
a PHY register, e.g. PHY_ID1, happens (signified by the returned value
0xFFFF) but subsequent accesses of the PHY work as expected.  Add a retry
counter similar to how it is done in the generic e1000_get_phy_id().

Also, when the PHY is completely inaccessible (i.e. when subsequent reads
of the PHY_IDx registers returns all F's) and the MDIO access mode must be
set to slow before attempting to read the PHY ID again, the functions that
do these latter two actions expect the SW/FW/HW semaphore is not already
set so the semaphore must be released before and re-acquired after calling
them otherwise there is an unnecessarily inordinate amount of delay during
device initialization.

Reported-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000e/ich8lan.c |   42 ++++++++++++++++++++-------
 1 file changed, 32 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
index 238ab2f..e3a7b07 100644
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
@@ -325,24 +325,46 @@ static inline void __ew32flash(struct e1000_hw *hw, unsigned long reg, u32 val)
  **/
 static bool e1000_phy_is_accessible_pchlan(struct e1000_hw *hw)
 {
-	u16 phy_reg;
-	u32 phy_id;
+	u16 phy_reg = 0;
+	u32 phy_id = 0;
+	s32 ret_val;
+	u16 retry_count;
+
+	for (retry_count = 0; retry_count < 2; retry_count++) {
+		ret_val = e1e_rphy_locked(hw, PHY_ID1, &phy_reg);
+		if (ret_val || (phy_reg == 0xFFFF))
+			continue;
+		phy_id = (u32)(phy_reg << 16);
 
-	e1e_rphy_locked(hw, PHY_ID1, &phy_reg);
-	phy_id = (u32)(phy_reg << 16);
-	e1e_rphy_locked(hw, PHY_ID2, &phy_reg);
-	phy_id |= (u32)(phy_reg & PHY_REVISION_MASK);
+		ret_val = e1e_rphy_locked(hw, PHY_ID2, &phy_reg);
+		if (ret_val || (phy_reg == 0xFFFF)) {
+			phy_id = 0;
+			continue;
+		}
+		phy_id |= (u32)(phy_reg & PHY_REVISION_MASK);
+		break;
+	}
 
 	if (hw->phy.id) {
 		if (hw->phy.id == phy_id)
 			return true;
-	} else {
-		if ((phy_id != 0) && (phy_id != PHY_REVISION_MASK))
-			hw->phy.id = phy_id;
+	} else if (phy_id) {
+		hw->phy.id = phy_id;
+		hw->phy.revision = (u32)(phy_reg & ~PHY_REVISION_MASK);
 		return true;
 	}
 
-	return false;
+	/*
+	 * In case the PHY needs to be in mdio slow mode,
+	 * set slow mode and try to get the PHY id again.
+	 */
+	hw->phy.ops.release(hw);
+	ret_val = e1000_set_mdio_slow_mode_hv(hw);
+	if (!ret_val)
+		ret_val = e1000e_get_phy_id(hw);
+	hw->phy.ops.acquire(hw);
+
+	return !ret_val;
 }
 
 /**
-- 
1.7.10.4

^ permalink raw reply related

* Re: resurrecting tcphealth
From: Piotr Sawuk @ 2012-07-14  7:56 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel

On Sa, 14.07.2012, 03:31, valdis.kletnieks@vt.edu wrote:
> On Fri, 13 Jul 2012 16:55:44 -0700, Stephen Hemminger said:
>
>> >+			/* Course retransmit inefficiency- this packet has been received
>> twice. */
>> >+			tp->dup_pkts_recv++;
>> I don't understand that comment, could you use a better sentence please?
>
> I think what was intended was:
>
> /* Curse you, retransmit inefficiency! This packet has been received at
least twice */
>

LOL, no. I think "course retransmit" is short for "course-grained timeout
caused retransmit" but I can't be sure since I'm not the author of these
lines. I'll replace that comment with the non-shorthand version though.
however, I think the real comment here should be:

/*A perceived shortcoming of the standard TCP implementation: A
TCP receiver can get duplicate packets from the sender because it cannot
acknowledge packets that arrive out of order. These duplicates would happen
when the sender mistakenly thinks some packets have been lost by the network
because it does not receive acks for them but in reality they were
successfully received out of order. Since the receiver has no way of letting
the sender know about the receipt of these packets, they could potentially
be re-sent and re-received at the receiver. Not only do duplicate packets
waste precious Internet bandwidth but they hurt performance because the
sender mistakenly detects congestion from packet losses. The SACK TCP
extension speci\fcally addresses this issue. A large number of duplicate
packets received would indicate a signi\fcant bene\ft to the wide adoption of
SACK. The duplicatepacketsreceived metric is computed at the
receiver and counts these packets on a per-connection basis.*/

as copied from his thesis at [1]. also in the thesis he writes:

In our limited experiment, the results indicated no duplicate packets were
received on any connection in the 18 hour run. This leads us to several
conclusions. Since duplicate ACKs were seen on many connections we know that
some packets were lost or reordered, but unACKed reordered packets never
caused a /coursegrainedtimeouts/ on our connections. Only these timeouts
will cause duplicate packets to be received since less severe out-of-order
conditions will be resolved with fast retransmits. The lack of course
timeouts
may be due to the quality of UCSD's ActiveWeb network or the paucity of
large gaps between received packet groups. It should be noted that Linux 2.2
implements fast retransmits for up to two packet gaps, thus reducing the
need for course grained timeouts due to the lack of SACK.

[1] https://sacerdoti.org/tcphealth/tcphealth-paper.pdf

^ permalink raw reply

* Re: PROBLEM: Silent data corruption when using sendfile()
From: Hillf Danton @ 2012-07-14  8:04 UTC (permalink / raw)
  To: Johannes Truschnigg
  Cc: linux-kernel, Eric Dumazet, Willy Tarreau, Linux-Netdev
In-Reply-To: <20120713171835.GA26052@vault.local>

On Sat, Jul 14, 2012 at 1:18 AM, Johannes Truschnigg
<johannes@truschnigg.info> wrote:
> Hello good people of linux-kernel.
>
> I've been bothered by silent data corruption from my personal fileserver - no
> matter the Layer 7 protocol used, huge transfers sporadically ended up damaged
> in-flight. I used Samba/CIFS, NFS(v4, via TCP), Apache httpd 2.2, thttpd,
> python and netcat to verify this.
>
> I think I managed to track down the culprit: as soon as I disable sendfile()
> for all programs that support such a configuration (netcat, afaik, won't ever
> use sendfile() to transmit data over a socket, so the problem was never
> reproducible there in the first place), everything reverts to perfect and
> proper working condition.
>
> I've been experiencing this problem with vanilla kernel releases from the 3.3
> up until 3.4.0 series. I do not know if it also occurs with earlier releases,
> but I can verify if that is useful. I set up the environment for a minimal
> kind of testcase (a large ISO image file available from the server's local
> filesystem, as well as from a mounted NFS export - once via lo, and once via
> br0/eth0), and proceeded to do the following:
>
> i=0; for i in {1..100}
> do
>   echo "pass $i:"; sync; echo 3 > /proc/sys/vm/drop_caches
>   cmp -b /mnt/nfs-test/lo/tmp/X15-65741.iso /srv/files/pub/tmp/X15-65741.iso
> done
>
> I then rotated the source of the data, and tested the network-mount against
> the loopback-mount, as well as the network-mount against the local filesystem.
>
> Computing the file's md5sum in a loop whilst dropping caches after each
> iteration by reading it directly from its location in the filesystem produces
> the very same hash every time - I therefore think it's safe to assume the
> corruption is introduced when traversing the networking stack. The hash also
> does not change if I repeadetly compute the md5sum of the file as transferred
> by, e. g., Apache httpd or smbd with sendfile explicitly disabled.
>
> Please take a look at the attachment to see the actual output of the above
> script. It does not matter if I do an actual transfer over the network from my
> server to one of its clients (I verified the problem with two different client
> machines, one even running Windows), or if the server is both source and
> destination of the transfer - as long as sendfile is involed, some of the data
> will always become garbled sooner or later. That also leads me to believe that
> my internetworking devices (my switch in particular) is working just fine;
> testing bulky transfers from one host to another confirms this insofar as thus
> all data makes it through unscathed.
>
> As soon as I switch off sendfile-support (in, e. g. Samba or Apache httpd), I
> can run a series of thousands and more transfers, and not experience any
> corruption at all. Whenever the data gets fubared, there is no hint at
> anything fishy going on in the debug ringbuffer - curruption takes place in
> total silence.
>
> The system in question has an Intel Pro/1000 PCI-e NIC for doing the networked
> file transfers, and is backed by a md RAID5-Array with LVM2 on top. The 4GB of
> system memory (ECC-enabled UDIMM) are operating in S4ECD4ED mode as reported
> by EDAC, and there are no reported errors. The CPU I have installed is an AMD
> Athlon II X2 245e on an ASUS M4A88TD-M/USB3 Motherboard. It's running Gentoo
> for amd64. The box can run prime96 in torture mode and linpack just fine for
> days - I'm therefore assuming the hardware to be working correctly.
>
> I have attached my kernel's config (from 3.4.0, as that's the image that I
> have running right now) attached for sake of completeness, as well as some
> information for you to see how I tested, and what these tests actually
> produced. If you need any other information to help track this down, please
> let me know.
>
> If you decide to answer please keep me CC'd, as I'm not subscribed to this
> list.
>
> Just in case the numerous attachments get scrubbed/removed, I've also uploaded
> them to http://johannes.truschnigg.info/tmp/sendfile_data_corruption/
>
> Thanks for reading, and have a nice weekend everyone :)
>

Is the above corruption related to the one below?


On Tue, Jul 3, 2012 at 8:02 AM, Willy Tarreau <w@1wt.eu> wrote:
>
> In fact it has been true zero copy in 2.6.25 until we faced a large
> amount of data corruption and the zero copy was disabled in 2.6.25.X.
> Since then it remained that way until you brought your patches to
> re-instantiate it.

^ permalink raw reply

* Re: [RFC PATCH] tun: don't zeroize sock->file on detach
From: Al Viro @ 2012-07-14  8:15 UTC (permalink / raw)
  To: Stanislav Kinsbursky; +Cc: davem, netdev, ruanzhijie, linux-kernel
In-Reply-To: <20120711114753.24395.53193.stgit@localhost6.localdomain6>

On Wed, Jul 11, 2012 at 03:48:20PM +0400, Stanislav Kinsbursky wrote:
> This is a fix for bug, introduced in 3.4 kernel by commit
> 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d, which, among other things, replaced
> simple sock_put() by sk_release_kernel(). Below is sequence, which leads to
> oops for non-persistent devices:
> 
> tun_chr_close()
> tun_detach()				<== tun->socket.file = NULL
> tun_free_netdev()
> sk_release_sock()
> sock_release(sock->file == NULL)
> iput(SOCK_INODE(sock))			<== dereference on NULL pointer
> 
> This patch just removes zeroing of socket's file from __tun_detach().
> sock_release() will do this.
> 
> Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
> ---
>  drivers/net/tun.c |    1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 987aeef..c1639f3 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -185,7 +185,6 @@ static void __tun_detach(struct tun_struct *tun)
>  	netif_tx_lock_bh(tun->dev);
>  	netif_carrier_off(tun->dev);
>  	tun->tfile = NULL;
> -	tun->socket.file = NULL;
>  	netif_tx_unlock_bh(tun->dev);

ACK, but I have to say that I don't like the entire area.  The games around sock->file
in general tend to be really nasty.  Examples:
1) net/9p/trans_fd.c:p9_socket_open():
	we come there with freshly created and connected struct socket in *csocket
	we do sock_map_fd() and bugger off if it fails
	we do get_file(csocket->file) twice and, having grabbed the references, close
the damn fd.
What happens if that races with close() on the same fd before we get to those get_file()?
We hit sock_close(), which calls sock_release(), which clears csocket->file.  Boom -
atomic_inc_long(&NULL->f_count) is not going to do us any good.  Outright bug, mitigated
only by the fact that all callchains to that place go through mount(2), so you have elevated
privs anyway.

2) with this sucker we hit an interesting interplay with vhost; note that the total effect
of tun_get_socket() does *not* include any refcount changes.  Nor should it - the caller
has a valid reference to struct file, after all.  Eventually the caller proceeds to drop
the same reference, by doing fput(sock->file).  And it will be the same struct file, but
proving that takes a lot of digging through the tun.c guts; the crucial observation is that
we never get to __tun_detach() as long as we have a reference to opened (cdev) file that
has been successfully attached at some point and that ones that hadn't been attached at
all wouldn't have passed through tun_get_socket().  IOW, it works, but it's brittle as hell.
Unless I've missed something in the analysis and it's really broken, that is.

Frankly, I would prefer to keep the reference to struct file for vhost explicitly in vhost
data structures.  Would be less dependent on the guts of tun/macvtap/whatnot that way...

3) iscsi goes as far as allocating fake struct file (with kzalloc(), and $DEITY help you
if you ever call fput() on that), presumably for the sake of sctp.  The only place in sctp
stack I see looking at sock->file is
        /* in-kernel sockets don't generally have a file allocated to them
         * if all they do is call sock_create_kern().
         */
        if (sk->sk_socket->file)
                f_flags = sk->sk_socket->file->f_flags;

        timeo = sock_sndtimeo(sk, f_flags & O_NONBLOCK);
in __sctp_connect() and AFAICS we could bloody well have left it NULL - we leave ->f_flags
zero in that code anyway and that's what __sctp_connect() will presume on NULL ->file.
I'm not familiar enough with sctp or iscsi, but at the first look it seems to be asking
for removal of all those games with ->file in the latter.

I really wonder if we have a single legitimate case for anything other than sock_alloc_file()
setting sock->file.  Anyone?

^ permalink raw reply

* Re: PROBLEM: Silent data corruption when using sendfile()
From: Eric Dumazet @ 2012-07-14  8:20 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Johannes Truschnigg, linux-kernel, Willy Tarreau, Linux-Netdev
In-Reply-To: <CAJd=RBAntSubDBbJ292SzeoN4hTwBQ_Q23jt+Y6i-+vfrQ5EHQ@mail.gmail.com>

On Sat, 2012-07-14 at 16:04 +0800, Hillf Danton wrote:
> On Sat, Jul 14, 2012 at 1:18 AM, Johannes Truschnigg
> <johannes@truschnigg.info> wrote:
> > Hello good people of linux-kernel.
> >
> > I've been bothered by silent data corruption from my personal fileserver - no
> > matter the Layer 7 protocol used, huge transfers sporadically ended up damaged
> > in-flight. I used Samba/CIFS, NFS(v4, via TCP), Apache httpd 2.2, thttpd,
> > python and netcat to verify this.
> >
> > I think I managed to track down the culprit: as soon as I disable sendfile()
> > for all programs that support such a configuration (netcat, afaik, won't ever
> > use sendfile() to transmit data over a socket, so the problem was never
> > reproducible there in the first place), everything reverts to perfect and
> > proper working condition.
> >
> > I've been experiencing this problem with vanilla kernel releases from the 3.3
> > up until 3.4.0 series. I do not know if it also occurs with earlier releases,
> > but I can verify if that is useful. I set up the environment for a minimal
> > kind of testcase (a large ISO image file available from the server's local
> > filesystem, as well as from a mounted NFS export - once via lo, and once via
> > br0/eth0), and proceeded to do the following:
> >
> > i=0; for i in {1..100}
> > do
> >   echo "pass $i:"; sync; echo 3 > /proc/sys/vm/drop_caches
> >   cmp -b /mnt/nfs-test/lo/tmp/X15-65741.iso /srv/files/pub/tmp/X15-65741.iso
> > done
> >
> > I then rotated the source of the data, and tested the network-mount against
> > the loopback-mount, as well as the network-mount against the local filesystem.
> >
> > Computing the file's md5sum in a loop whilst dropping caches after each
> > iteration by reading it directly from its location in the filesystem produces
> > the very same hash every time - I therefore think it's safe to assume the
> > corruption is introduced when traversing the networking stack. The hash also
> > does not change if I repeadetly compute the md5sum of the file as transferred
> > by, e. g., Apache httpd or smbd with sendfile explicitly disabled.
> >
> > Please take a look at the attachment to see the actual output of the above
> > script. It does not matter if I do an actual transfer over the network from my
> > server to one of its clients (I verified the problem with two different client
> > machines, one even running Windows), or if the server is both source and
> > destination of the transfer - as long as sendfile is involed, some of the data
> > will always become garbled sooner or later. That also leads me to believe that
> > my internetworking devices (my switch in particular) is working just fine;
> > testing bulky transfers from one host to another confirms this insofar as thus
> > all data makes it through unscathed.
> >
> > As soon as I switch off sendfile-support (in, e. g. Samba or Apache httpd), I
> > can run a series of thousands and more transfers, and not experience any
> > corruption at all. Whenever the data gets fubared, there is no hint at
> > anything fishy going on in the debug ringbuffer - curruption takes place in
> > total silence.
> >
> > The system in question has an Intel Pro/1000 PCI-e NIC for doing the networked
> > file transfers, and is backed by a md RAID5-Array with LVM2 on top. The 4GB of
> > system memory (ECC-enabled UDIMM) are operating in S4ECD4ED mode as reported
> > by EDAC, and there are no reported errors. The CPU I have installed is an AMD
> > Athlon II X2 245e on an ASUS M4A88TD-M/USB3 Motherboard. It's running Gentoo
> > for amd64. The box can run prime96 in torture mode and linpack just fine for
> > days - I'm therefore assuming the hardware to be working correctly.
> >
> > I have attached my kernel's config (from 3.4.0, as that's the image that I
> > have running right now) attached for sake of completeness, as well as some
> > information for you to see how I tested, and what these tests actually
> > produced. If you need any other information to help track this down, please
> > let me know.
> >
> > If you decide to answer please keep me CC'd, as I'm not subscribed to this
> > list.
> >
> > Just in case the numerous attachments get scrubbed/removed, I've also uploaded
> > them to http://johannes.truschnigg.info/tmp/sendfile_data_corruption/
> >
> > Thanks for reading, and have a nice weekend everyone :)
> >
> 
> Is the above corruption related to the one below?
> 
> 
> On Tue, Jul 3, 2012 at 8:02 AM, Willy Tarreau <w@1wt.eu> wrote:
> >
> > In fact it has been true zero copy in 2.6.25 until we faced a large
> > amount of data corruption and the zero copy was disabled in 2.6.25.X.
> > Since then it remained that way until you brought your patches to
> > re-instantiate it.

Might be, or not (could be a NIC bug)

Please Johannes could you try latest kernel tree ?

^ permalink raw reply

* Re: resurrecting tcphealth
From: Eric Dumazet @ 2012-07-14  8:27 UTC (permalink / raw)
  To: Piotr Sawuk; +Cc: netdev, linux-kernel
In-Reply-To: <cc6495b92f1df180c1ad43057ceb0b98.squirrel@webmail.univie.ac.at>

On Sat, 2012-07-14 at 09:56 +0200, Piotr Sawuk wrote:
> On Sa, 14.07.2012, 03:31, valdis.kletnieks@vt.edu wrote:
> > On Fri, 13 Jul 2012 16:55:44 -0700, Stephen Hemminger said:
> >
> >> >+			/* Course retransmit inefficiency- this packet has been received
> >> twice. */
> >> >+			tp->dup_pkts_recv++;
> >> I don't understand that comment, could you use a better sentence please?
> >
> > I think what was intended was:
> >
> > /* Curse you, retransmit inefficiency! This packet has been received at
> least twice */
> >
> 
> LOL, no. I think "course retransmit" is short for "course-grained timeout
> caused retransmit" but I can't be sure since I'm not the author of these
> lines. I'll replace that comment with the non-shorthand version though.
> however, I think the real comment here should be:
> 
> /*A perceived shortcoming of the standard TCP implementation: A
> TCP receiver can get duplicate packets from the sender because it cannot
> acknowledge packets that arrive out of order. These duplicates would happen
> when the sender mistakenly thinks some packets have been lost by the network
> because it does not receive acks for them but in reality they were
> successfully received out of order. Since the receiver has no way of letting
> the sender know about the receipt of these packets, they could potentially
> be re-sent and re-received at the receiver. Not only do duplicate packets
> waste precious Internet bandwidth but they hurt performance because the
> sender mistakenly detects congestion from packet losses. The SACK TCP
> extension speci\fcally addresses this issue. A large number of duplicate
> packets received would indicate a signi\fcant bene\ft to the wide adoption of
> SACK. The duplicatepacketsreceived metric is computed at the
> receiver and counts these packets on a per-connection basis.*/
> 
> as copied from his thesis at [1]. also in the thesis he writes:
> 
> In our limited experiment, the results indicated no duplicate packets were
> received on any connection in the 18 hour run. This leads us to several
> conclusions. Since duplicate ACKs were seen on many connections we know that
> some packets were lost or reordered, but unACKed reordered packets never
> caused a /coursegrainedtimeouts/ on our connections. Only these timeouts
> will cause duplicate packets to be received since less severe out-of-order
> conditions will be resolved with fast retransmits. The lack of course
> timeouts
> may be due to the quality of UCSD's ActiveWeb network or the paucity of
> large gaps between received packet groups. It should be noted that Linux 2.2
> implements fast retransmits for up to two packet gaps, thus reducing the
> need for course grained timeouts due to the lack of SACK.
> 
> [1] https://sacerdoti.org/tcphealth/tcphealth-paper.pdf

Not sure how pertinent is this paper today in 2012

I would prefer you add global counters, instead of per tcp counters that
most applications wont use at all.

Example of a more useful patch : add a counter of packets queued in Out
Of Order queue ( in tcp_data_queue_ofo() )

"netstat -s" will display the total count, without any changes in
userland tools/applications.

^ permalink raw reply

* Re: [RFC 2/2] net: Add support for NTB virtual ethernet device
From: Jiri Pirko @ 2012-07-14  8:30 UTC (permalink / raw)
  To: Jon Mason; +Cc: linux-kernel, netdev, linux-pci, Dave Jiang
In-Reply-To: <20120714055034.GB4808@jonmason-lab>

Sat, Jul 14, 2012 at 07:50:35AM CEST, jon.mason@intel.com wrote:
>On Sat, Jul 14, 2012 at 01:14:03AM +0200, Jiri Pirko wrote:
>> Fri, Jul 13, 2012 at 11:45:00PM CEST, jon.mason@intel.com wrote:
>> >A virtual ethernet device that uses the NTB transport API to send/receive data.
>> >
>> >Signed-off-by: Jon Mason <jon.mason@intel.com>
>> >---
>> > drivers/net/Kconfig      |    4 +
>> > drivers/net/Makefile     |    1 +
>> > drivers/net/ntb_netdev.c |  411 ++++++++++++++++++++++++++++++++++++++++++++++
>> > 3 files changed, 416 insertions(+), 0 deletions(-)
>> > create mode 100644 drivers/net/ntb_netdev.c

<snip>

>> >+
>> >+static const struct net_device_ops ntb_netdev_ops = {
>> >+	.ndo_open = ntb_netdev_open,
>> >+	.ndo_stop = ntb_netdev_close,
>> >+	.ndo_start_xmit = ntb_netdev_start_xmit,
>> >+	.ndo_change_mtu = ntb_netdev_change_mtu,
>> >+	.ndo_tx_timeout = ntb_netdev_tx_timeout,
>> >+	.ndo_set_mac_address = eth_mac_addr,
>> 
>> Does your device support mac change while it's up and running?
>
>It's virtual ethernet, so there is no hardware limitation, only what is acceptable for the remote side to receive.

In that case, it would be good to do:
dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;

This enables mac change in eth_mac_addr() when iface is running.

<snip>

>> >+
>> >+static int __init ntb_netdev_init_module(void)
>> >+{
>> >+	struct ntb_netdev *dev;
>> >+	int rc;
>> >+
>> >+	pr_info("%s: Probe\n", KBUILD_MODNAME);
>> >+
>> >+	netdev = alloc_etherdev(sizeof(struct ntb_netdev));
>> 
>> I might be missing something but this place (module init) does not seems
>> like a good place to do alloc_etherdev(). Do you want to support only
>> one netdevice instance?
>> 
>> Anyway, I think that using "static netdev" should be avoided in any case.
>> 
>
>It would fail the probe if there is no underlying ntb hardware, but it would make sense to check for that before allocing the etherdev.

But isn't there possible to have multiple ntb hardware devices? It would make
sense to register ntb device here with ntb core and let the core call
probe which would actually create new netdev.

Is there a limitation that one underlying ntb hardware ~ one ntb netdevice?

Thanks,
Jiri

^ permalink raw reply

* Re: PROBLEM: Silent data corruption when using sendfile()
From: Willy Tarreau @ 2012-07-14  8:31 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Hillf Danton, Johannes Truschnigg, linux-kernel, Linux-Netdev
In-Reply-To: <1342254042.3265.9017.camel@edumazet-glaptop>

On Sat, Jul 14, 2012 at 10:20:41AM +0200, Eric Dumazet wrote:
> > On Tue, Jul 3, 2012 at 8:02 AM, Willy Tarreau <w@1wt.eu> wrote:
> > >
> > > In fact it has been true zero copy in 2.6.25 until we faced a large
> > > amount of data corruption and the zero copy was disabled in 2.6.25.X.
> > > Since then it remained that way until you brought your patches to
> > > re-instantiate it.
> 
> Might be, or not (could be a NIC bug)

I may be wrong but what I recall from this bug was an issue when
forwarding TCP between two NICs, related to linear vs non-linear
data (I have memories of something around data not yet ACKed being
replaced before being retransmitted but I may be wrong). Anyway,
the way it was fixed consisted in simply disabling the zero-copy
code path. So this should be something different from what Johannes
reports. Maybe a regression since then though.

> Please Johannes could you try latest kernel tree ?

It would be useful, especially given the amount of changes you performed
in this area in latest version, it could be very possible that this new
bug got fixed as a side effect !

Regards,
Willy

^ permalink raw reply

* [net 1/2] e1000e: Correct link check logic for 82571 serdes
From: Jeff Kirsher @ 2012-07-14  8:34 UTC (permalink / raw)
  To: davem
  Cc: Tushar Dave, netdev, gospo, sassmann, stable, dnelson,
	bruce.w.allan, Jeff Kirsher

From: Tushar Dave <tushar.n.dave@intel.com>

SYNCH bit and IV bit of RXCW register are sticky. Before examining these bits,
RXCW should be read twice to filter out one-time false events and have correct
values for these bits. Incorrect values of these bits in link check logic can
cause weird link stability issues if auto-negotiation fails.

CC: stable <stable@vger.kernel.org> [2.6.38+]
Reported-by: Dean Nelson <dnelson@redhat.com>
Signed-off-by: Tushar Dave <tushar.n.dave@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000e/82571.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/intel/e1000e/82571.c b/drivers/net/ethernet/intel/e1000e/82571.c
index 36db4df..1f063dc 100644
--- a/drivers/net/ethernet/intel/e1000e/82571.c
+++ b/drivers/net/ethernet/intel/e1000e/82571.c
@@ -1572,6 +1572,9 @@ static s32 e1000_check_for_serdes_link_82571(struct e1000_hw *hw)
 	ctrl = er32(CTRL);
 	status = er32(STATUS);
 	rxcw = er32(RXCW);
+	/* SYNCH bit and IV bit are sticky */
+	udelay(10);
+	rxcw = er32(RXCW);
 
 	if ((rxcw & E1000_RXCW_SYNCH) && !(rxcw & E1000_RXCW_IV)) {
 
-- 
1.7.10.4

^ permalink raw reply related

* Re: PROBLEM: Silent data corruption when using sendfile()
From: Johannes Truschnigg @ 2012-07-14 10:13 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Eric Dumazet, Hillf Danton, linux-kernel, Linux-Netdev
In-Reply-To: <20120714083136.GO16256@1wt.eu>

[-- Attachment #1: Type: text/plain, Size: 1154 bytes --]

On Sat, Jul 14, 2012 at 10:31:36AM +0200, Willy Tarreau wrote:
> > Please Johannes could you try latest kernel tree ?
> 
> It would be useful, especially given the amount of changes you performed
> in this area in latest version, it could be very possible that this new
> bug got fixed as a side effect !

I upgraded to 3.4.4 (identical config as the 3.4.0 build I've been running)
and what can I say - the problem really seems to have disappeared. I performed
about 3700 iterations of my previos tests over the night, and the data always
turned out to be OK, not a single byte turned out kaput!

I wish I would have tested that earlier, and spared you the noise... well,
maybe someone who runs into a similar problem in the future will have this
discovery save her/him some time and headaches and make her/him just upgrade
kernels :)

Thanks a lot for your polite and quick responses!

-- 
with best regards:
- Johannes Truschnigg ( johannes@truschnigg.info )

www:   http://johannes.truschnigg.info/
phone: +43 650 2 133337
xmpp:  johannes@truschnigg.info

Please do not bother me with HTML-email or attachments. Thank you.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: PROBLEM: Silent data corruption when using sendfile()
From: Eric Dumazet @ 2012-07-14 10:33 UTC (permalink / raw)
  To: Johannes Truschnigg
  Cc: Willy Tarreau, Hillf Danton, linux-kernel, Linux-Netdev
In-Reply-To: <20120714101321.GA26329@vault.local>

On Sat, 2012-07-14 at 12:13 +0200, Johannes Truschnigg wrote:
> On Sat, Jul 14, 2012 at 10:31:36AM +0200, Willy Tarreau wrote:
> > > Please Johannes could you try latest kernel tree ?
> > 
> > It would be useful, especially given the amount of changes you performed
> > in this area in latest version, it could be very possible that this new
> > bug got fixed as a side effect !
> 
> I upgraded to 3.4.4 (identical config as the 3.4.0 build I've been running)
> and what can I say - the problem really seems to have disappeared. I performed
> about 3700 iterations of my previos tests over the night, and the data always
> turned out to be OK, not a single byte turned out kaput!
> 
> I wish I would have tested that earlier, and spared you the noise... well,
> maybe someone who runs into a similar problem in the future will have this
> discovery save her/him some time and headaches and make her/him just upgrade
> kernels :)
> 
> Thanks a lot for your polite and quick responses!
> 

Nice to hear. Now we should make sure we have all needed fixes for prior
stable kernels as well !

Still trying to understand the issue, since I thought I only did
optimizations, not bug fixes. So maybe real bug is still there but its
probability of occurrence lowered enough to not hit your workload.
 
Hmmm...

^ permalink raw reply

* Re: PROBLEM: Silent data corruption when using sendfile()
From: Willy Tarreau @ 2012-07-14 10:44 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Johannes Truschnigg, Hillf Danton, linux-kernel, Linux-Netdev
In-Reply-To: <1342262004.3265.9279.camel@edumazet-glaptop>

On Sat, Jul 14, 2012 at 12:33:24PM +0200, Eric Dumazet wrote:
> On Sat, 2012-07-14 at 12:13 +0200, Johannes Truschnigg wrote:
> > On Sat, Jul 14, 2012 at 10:31:36AM +0200, Willy Tarreau wrote:
> > > > Please Johannes could you try latest kernel tree ?
> > > 
> > > It would be useful, especially given the amount of changes you performed
> > > in this area in latest version, it could be very possible that this new
> > > bug got fixed as a side effect !
> > 
> > I upgraded to 3.4.4 (identical config as the 3.4.0 build I've been running)
> > and what can I say - the problem really seems to have disappeared. I performed
> > about 3700 iterations of my previos tests over the night, and the data always
> > turned out to be OK, not a single byte turned out kaput!
> > 
> > I wish I would have tested that earlier, and spared you the noise... well,
> > maybe someone who runs into a similar problem in the future will have this
> > discovery save her/him some time and headaches and make her/him just upgrade
> > kernels :)
> > 
> > Thanks a lot for your polite and quick responses!
> > 
> 
> Nice to hear. Now we should make sure we have all needed fixes for prior
> stable kernels as well !
> 
> Still trying to understand the issue, since I thought I only did
> optimizations, not bug fixes. So maybe real bug is still there but its
> probability of occurrence lowered enough to not hit your workload.

Please note that Johannes tested 3.4.4 while your changes are in 3.5-rc.

I'm wondering whether this patch merged into 3.4.2 one has an impact on
sendfile :

  commit b642cb6a143da812f188307c2661c0357776a9d0
  Author: Konstantin Khlebnikov <khlebnikov@openvz.org>
  Date:   Tue Jun 5 21:36:33 2012 +0400

    radix-tree: fix contiguous iterator
    
    commit fffaee365fded09f9ebf2db19066065fa54323c3 upstream.
    
    This patch fixes bug in macro radix_tree_for_each_contig().
    
    If radix_tree_next_slot() sees NULL in next slot it returns NULL, but following
    radix_tree_next_chunk() switches iterating into next chunk. As result iterating
    becomes non-contiguous and breaks vfs "splice" and all its users.

Willy

^ permalink raw reply

* Re: PROBLEM: Silent data corruption when using sendfile()
From: Eric Dumazet @ 2012-07-14 11:06 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Johannes Truschnigg, Hillf Danton, linux-kernel, Linux-Netdev
In-Reply-To: <20120714104441.GP16256@1wt.eu>

On Sat, 2012-07-14 at 12:44 +0200, Willy Tarreau wrote:
> On Sat, Jul 14, 2012 at 12:33:24PM +0200, Eric Dumazet wrote:
> > On Sat, 2012-07-14 at 12:13 +0200, Johannes Truschnigg wrote:
> > > On Sat, Jul 14, 2012 at 10:31:36AM +0200, Willy Tarreau wrote:
> > > > > Please Johannes could you try latest kernel tree ?
> > > > 
> > > > It would be useful, especially given the amount of changes you performed
> > > > in this area in latest version, it could be very possible that this new
> > > > bug got fixed as a side effect !
> > > 
> > > I upgraded to 3.4.4 (identical config as the 3.4.0 build I've been running)
> > > and what can I say - the problem really seems to have disappeared. I performed
> > > about 3700 iterations of my previos tests over the night, and the data always
> > > turned out to be OK, not a single byte turned out kaput!
> > > 
> > > I wish I would have tested that earlier, and spared you the noise... well,
> > > maybe someone who runs into a similar problem in the future will have this
> > > discovery save her/him some time and headaches and make her/him just upgrade
> > > kernels :)
> > > 
> > > Thanks a lot for your polite and quick responses!
> > > 
> > 
> > Nice to hear. Now we should make sure we have all needed fixes for prior
> > stable kernels as well !
> > 
> > Still trying to understand the issue, since I thought I only did
> > optimizations, not bug fixes. So maybe real bug is still there but its
> > probability of occurrence lowered enough to not hit your workload.
> 
> Please note that Johannes tested 3.4.4 while your changes are in 3.5-rc.
> 
> I'm wondering whether this patch merged into 3.4.2 one has an impact on
> sendfile :
> 
>   commit b642cb6a143da812f188307c2661c0357776a9d0
>   Author: Konstantin Khlebnikov <khlebnikov@openvz.org>
>   Date:   Tue Jun 5 21:36:33 2012 +0400
> 
>     radix-tree: fix contiguous iterator
>     
>     commit fffaee365fded09f9ebf2db19066065fa54323c3 upstream.
>     
>     This patch fixes bug in macro radix_tree_for_each_contig().
>     
>     If radix_tree_next_slot() sees NULL in next slot it returns NULL, but following
>     radix_tree_next_chunk() switches iterating into next chunk. As result iterating
>     becomes non-contiguous and breaks vfs "splice" and all its users.
> 
> Willy
> 


Hmmm, this is supposed to fix a bug introduced in 3.4, no ?

So 3.3 kernel should work well ?

^ permalink raw reply

* Re: PROBLEM: Silent data corruption when using sendfile()
From: Thorsten Kranzkowski @ 2012-07-14 11:44 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Johannes Truschnigg, Willy Tarreau, Hillf Danton, linux-kernel,
	Linux-Netdev
In-Reply-To: <1342262004.3265.9279.camel@edumazet-glaptop>

On Sat, Jul 14, 2012 at 12:33:24PM +0200, Eric Dumazet wrote:
> On Sat, 2012-07-14 at 12:13 +0200, Johannes Truschnigg wrote:
> > On Sat, Jul 14, 2012 at 10:31:36AM +0200, Willy Tarreau wrote:
> > > > Please Johannes could you try latest kernel tree ?
> > > 
> > > It would be useful, especially given the amount of changes you performed
> > > in this area in latest version, it could be very possible that this new
> > > bug got fixed as a side effect !
> > 
> > I upgraded to 3.4.4 (identical config as the 3.4.0 build I've been running)
> > and what can I say - the problem really seems to have disappeared. I performed
> > about 3700 iterations of my previos tests over the night, and the data always
> > turned out to be OK, not a single byte turned out kaput!
> > 
> > I wish I would have tested that earlier, and spared you the noise... well,
> > maybe someone who runs into a similar problem in the future will have this
> > discovery save her/him some time and headaches and make her/him just upgrade
> > kernels :)
> > 
> > Thanks a lot for your polite and quick responses!
> > 
> 
> Nice to hear. Now we should make sure we have all needed fixes for prior
> stable kernels as well !
> 
> Still trying to understand the issue, since I thought I only did
> optimizations, not bug fixes. So maybe real bug is still there but its
> probability of occurrence lowered enough to not hit your workload.
>  
> Hmmm...
> 

Not sure if this is related, but I had a similar data corruption problem:
Reading data from filesystem 'normally' (including through nfs) showed
corruption at random places, mostly 0xff tuning into 0xfe.
Reading with ODIRECT (I used 'dd iflag=direct') was OK.

I found my problem to be fixed by
fffaee365fded09f9ebf2db19066065fa54323c3 (upstrem)
which was backported as
b642cb6a143da812f188307c2661c0357776a9d0 (stable, v3.4.1-66-gb642cb6)


Bye,
Thorsten

-- 
| Thorsten Kranzkowski        Internet: dl8bcu@dl8bcu.de                      |
| Mobile: ++49 170 1876134       Snail: Kiebitzstr. 14, 49324 Melle, Germany  |
| Ampr: dl8bcu@db0lj.#rpl.deu.eu, dl8bcu@marvin.dl8bcu.ampr.org [44.130.8.19] |

^ permalink raw reply

* Re: PROBLEM: Silent data corruption when using sendfile()
From: Willy Tarreau @ 2012-07-14 13:15 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Johannes Truschnigg, Hillf Danton, linux-kernel, Linux-Netdev
In-Reply-To: <1342263967.3265.9347.camel@edumazet-glaptop>

On Sat, Jul 14, 2012 at 01:06:07PM +0200, Eric Dumazet wrote:
> On Sat, 2012-07-14 at 12:44 +0200, Willy Tarreau wrote:
> > On Sat, Jul 14, 2012 at 12:33:24PM +0200, Eric Dumazet wrote:
> > > On Sat, 2012-07-14 at 12:13 +0200, Johannes Truschnigg wrote:
> > > > On Sat, Jul 14, 2012 at 10:31:36AM +0200, Willy Tarreau wrote:
> > > > > > Please Johannes could you try latest kernel tree ?
> > > > > 
> > > > > It would be useful, especially given the amount of changes you performed
> > > > > in this area in latest version, it could be very possible that this new
> > > > > bug got fixed as a side effect !
> > > > 
> > > > I upgraded to 3.4.4 (identical config as the 3.4.0 build I've been running)
> > > > and what can I say - the problem really seems to have disappeared. I performed
> > > > about 3700 iterations of my previos tests over the night, and the data always
> > > > turned out to be OK, not a single byte turned out kaput!
> > > > 
> > > > I wish I would have tested that earlier, and spared you the noise... well,
> > > > maybe someone who runs into a similar problem in the future will have this
> > > > discovery save her/him some time and headaches and make her/him just upgrade
> > > > kernels :)
> > > > 
> > > > Thanks a lot for your polite and quick responses!
> > > > 
> > > 
> > > Nice to hear. Now we should make sure we have all needed fixes for prior
> > > stable kernels as well !
> > > 
> > > Still trying to understand the issue, since I thought I only did
> > > optimizations, not bug fixes. So maybe real bug is still there but its
> > > probability of occurrence lowered enough to not hit your workload.
> > 
> > Please note that Johannes tested 3.4.4 while your changes are in 3.5-rc.
> > 
> > I'm wondering whether this patch merged into 3.4.2 one has an impact on
> > sendfile :
> > 
> >   commit b642cb6a143da812f188307c2661c0357776a9d0
> >   Author: Konstantin Khlebnikov <khlebnikov@openvz.org>
> >   Date:   Tue Jun 5 21:36:33 2012 +0400
> > 
> >     radix-tree: fix contiguous iterator
> >     
> >     commit fffaee365fded09f9ebf2db19066065fa54323c3 upstream.
> >     
> >     This patch fixes bug in macro radix_tree_for_each_contig().
> >     
> >     If radix_tree_next_slot() sees NULL in next slot it returns NULL, but following
> >     radix_tree_next_chunk() switches iterating into next chunk. As result iterating
> >     becomes non-contiguous and breaks vfs "splice" and all its users.
> > 
> > Willy
> > 
> 
> 
> Hmmm, this is supposed to fix a bug introduced in 3.4, no ?
> 
> So 3.3 kernel should work well ?

You're right indeed. So maybe it's not the same bug. Or maybe Johannes
was affected by two different bugs in both versions, since Thorsten's
report seems to point the finger at the same bug.

Johannes, are you certain that you were having the exact same issue
with 3.3 ?

Willy

^ permalink raw reply

* [PATCH net-next] netem: refine early skb orphaning
From: Eric Dumazet @ 2012-07-14 13:16 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Hagen Paul Pfeifer, Mark Gordon, Andreas Terzis,
	Yuchung Cheng

From: Eric Dumazet <edumazet@google.com>

netem does an early orphaning of skbs. Doing so breaks TCP Small Queue
or any mechanism relying on socket sk_wmem_alloc feedback.

Ideally, we should perform this orphaning after the rate module and
before the delay module, to mimic what happens on a real link :

skb orphaning is indeed normally done at TX completion, before the
transit on the link.

+-------+   +--------+  +---------------+  +-----------------+
+ Qdisc +---> Device +--> TX completion +--> links / hops    +->
+       +   +  xmit  +  + skb orphaning +  + propagation     +
+-------+   +--------+  +---------------+  +-----------------+
      < rate limiting >                  < delay, drops, reorders >

If netem is used without delay feature (drops, reorders, rate
limiting), then we should avoid early skb orphaning, to keep pressure
on sockets as long as packets are still in qdisc queue.

Ideally, netem should be refactored to implement delay module
as the last stage. Current algorithm merges the two phases
(rate limiting + delay) so its not correct.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Mark Gordon <msg@google.com>
Cc: Andreas Terzis <aterzis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
---
 net/sched/sch_netem.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index c412ad0..298c0dd 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -380,7 +380,14 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
 	}
 
-	skb_orphan(skb);
+	/* If a delay is expected, orphan the skb. (orphaning usually takes
+	 * place at TX completion time, so _before_ the link transit delay)
+	 * Ideally, this orphaning should be done after the rate limiting
+	 * module, because this breaks TCP Small Queue, and other mechanisms
+	 * based on socket sk_wmem_alloc.
+	 */
+	if (q->latency || q->jitter)
+		skb_orphan(skb);
 
 	/*
 	 * If we need to duplicate packet, then re-insert at top of the

^ permalink raw reply related

* SUSPECT: 营>销管<理者的八维行为准则c履行管理职能与创造销售结果
From: 6 @ 2012-07-14 11:32 UTC (permalink / raw)
  To: netdemon

[-- Attachment #1: 从 销(售骨干走向管)理高手快速蜕变特 训营.xls --]
[-- Type: application/octet-stream, Size: 38912 bytes --]

^ permalink raw reply

* Re: PROBLEM: Silent data corruption when using sendfile()
From: Hillf Danton @ 2012-07-14 14:08 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Johannes Truschnigg, linux-kernel, Willy Tarreau, Linux-Netdev
In-Reply-To: <1342254042.3265.9017.camel@edumazet-glaptop>

On Sat, Jul 14, 2012 at 4:20 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Might be, or not (could be a NIC bug)
>
Dunno why sendfile sits in the layer of NIC and
how they interact.

^ permalink raw reply

* Re: PROBLEM: Silent data corruption when using sendfile()
From: Eric Dumazet @ 2012-07-14 14:19 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Johannes Truschnigg, linux-kernel, Willy Tarreau, Linux-Netdev
In-Reply-To: <CAJd=RBAJOqJDSBpaaB+2-WU_pa5vChXSf6TbLH8fi3HNt6hZ9w@mail.gmail.com>

On Sat, 2012-07-14 at 22:08 +0800, Hillf Danton wrote:
> On Sat, Jul 14, 2012 at 4:20 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > Might be, or not (could be a NIC bug)
> >
> Dunno why sendfile sits in the layer of NIC and
> how they interact.

sendfile() relies heavily on TSO capabilities, a buggy NIC could
corrupt frame content on some obscure occasions.

We had some known cases on IPv6 for example.

^ permalink raw reply

* Re: PROBLEM: Silent data corruption when using sendfile()
From: Willy Tarreau @ 2012-07-14 14:56 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Hillf Danton, Johannes Truschnigg, linux-kernel, Linux-Netdev
In-Reply-To: <1342275540.3265.9760.camel@edumazet-glaptop>

On Sat, Jul 14, 2012 at 04:19:00PM +0200, Eric Dumazet wrote:
> On Sat, 2012-07-14 at 22:08 +0800, Hillf Danton wrote:
> > On Sat, Jul 14, 2012 at 4:20 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > >
> > > Might be, or not (could be a NIC bug)
> > >
> > Dunno why sendfile sits in the layer of NIC and
> > how they interact.
> 
> sendfile() relies heavily on TSO capabilities, a buggy NIC could
> corrupt frame content on some obscure occasions.
> 
> We had some known cases on IPv6 for example.

Similarly I remind having experienced bugs on early Yukon chips years
ago that would regularly emit total crap on the wire.

Willy

^ permalink raw reply

* Re: [PATCH] iptables: xt_recent: Add optional mask option for xt_recent
From: Pablo Neira Ayuso @ 2012-07-14 15:05 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: netfilter-devel, Linux netdev
In-Reply-To: <1337285337-13619-1-git-send-email-denys@visp.net.lb>

Hi Denys,

On Thu, May 17, 2012 at 11:08:57PM +0300, Denys Fedoryshchenko wrote:
> Use case for this feature:
> 1)In some occasions if you need to allow,block,match specific subnet.
> 2)I can use recent as a trigger when netfilter rule matches, with mask 0.0.0.0
> 
> Tested for backward compatibility:
> )old (userspace) iptables, new kernel
> )old kernel, new iptables
> )new kernel, new iptables

I've cleaned up this patch:

http://git.netfilter.org/cgi-bin/gitweb.cgi?p=iptables.git;a=commit;h=73bf03981dfaee48ac1d6da380d46501a96cc83e

It's not yet in master. Please, check that this is correct.

BTW, the man page update is still missing.

^ permalink raw reply

* [PATCH 2/6] drivers/net/can/softing/softing_main.c: ensure a consistent return value in error case
From: Julia Lawall @ 2012-07-14 16:43 UTC (permalink / raw)
  To: Wolfgang Grandegger
  Cc: kernel-janitors, Marc Kleine-Budde, linux-can, netdev,
	linux-kernel
In-Reply-To: <1342284188-19176-1-git-send-email-Julia.Lawall@lip6.fr>

From: Julia Lawall <Julia.Lawall@lip6.fr>

Typically, the return value desired for the failure of a function with an
integer return value is a negative integer.  In these cases, the return
value is sometimes a negative integer and sometimes 0, due to a subsequent
initialization of the return variable within the loop.

A simplified version of the semantic match that finds this problem is:
(http://coccinelle.lip6.fr/)

//<smpl>
@r exists@
identifier ret;
position p;
constant C;
expression e1,e3,e4;
statement S;
@@

ret = -C
... when != ret = e3
    when any
if@p (...) S
... when any
if (\(ret != 0\|ret < 0\|ret > 0\) || ...) { ... return ...; }
... when != ret = e3
    when any
*if@p (...)
{
  ... when != ret = e4
  return ret;
}
//</smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>

---
 drivers/net/can/softing/softing_main.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/can/softing/softing_main.c b/drivers/net/can/softing/softing_main.c
index a7c77c7..f2a221e 100644
--- a/drivers/net/can/softing/softing_main.c
+++ b/drivers/net/can/softing/softing_main.c
@@ -826,12 +826,12 @@ static __devinit int softing_pdev_probe(struct platform_device *pdev)
 		goto sysfs_failed;
 	}
 
-	ret = -ENOMEM;
 	for (j = 0; j < ARRAY_SIZE(card->net); ++j) {
 		card->net[j] = netdev =
 			softing_netdev_create(card, card->id.chip[j]);
 		if (!netdev) {
 			dev_alert(&pdev->dev, "failed to make can[%i]", j);
+			ret = -ENOMEM;
 			goto netdev_failed;
 		}
 		priv = netdev_priv(card->net[j]);


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox