Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v4 net-next] fix unsafe set_memory_rw from softirq
From: David Miller @ 2013-10-07 19:17 UTC (permalink / raw)
  To: eric.dumazet
  Cc: ast, dborkman, edumazet, heiko.carstens, linux-arm-kernel,
	linuxppc-dev, linux-s390, netdev
In-Reply-To: <1381078592.12191.0.camel@edumazet-glaptop.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 06 Oct 2013 09:56:32 -0700

> On Fri, 2013-10-04 at 00:14 -0700, Alexei Starovoitov wrote:
>> on x86 system with net.core.bpf_jit_enable = 1
> 
>> cannot reuse jited filter memory, since it's readonly,
>> so use original bpf insns memory to hold work_struct
>> 
>> defer kfree of sk_filter until jit completed freeing
>> 
>> tested on x86_64 and i386
>> 
>> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
> 
> Acked-by: Eric Dumazet <edumazet@google.com>

I've decided to apply this to 'net', thanks.

^ permalink raw reply

* Re: bug in passing file descriptors
From: David Miller @ 2013-10-07 19:12 UTC (permalink / raw)
  To: sar; +Cc: luto, netdev, mtk.manpages, ebiederm
In-Reply-To: <5253061A.7060701@nec-labs.com>

From: Steve Rago <sar@nec-labs.com>
Date: Mon, 7 Oct 2013 15:06:02 -0400

> Maybe.  So a client expecting to receive x bytes of control
> information should make sure their buffer is at least CMSG_SPACE(x)
> bytes long instead of CMSG_LEN(x) bytes long, because you feel
> compelled to copy the final padding from kernel space to user space?
> Seems wrong to me.  IMHO, the final padding should only come into play
> when calculating where the next header should begin.

Yes, all control messages must be aligned to, and be of a length of a
multiple of, "sizeof(long)".

This is the only correct way to program control messages.

^ permalink raw reply

* bug in passing file descriptors
From: Steve Rago @ 2013-10-07 18:27 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, mtk.manpages, Andy Lutomirski, Eric Biederman

[-- Attachment #1: Type: text/plain, Size: 1327 bytes --]

Sending this to a larger group at mtk's suggestion.

I recently found a bug with passing file descriptors over Unix domain sockets.  The attached program illustrates the 
problem.  I believe the source of the problem is in net/core/scm.c.  In put_cmsg(), cmlen is calculated as 
CMSG_SPACE(len) for the purposes of setting msg_controllen, but it probably should be CMSG_LEN(len), at least in this 
particular case (I didn't investigate other use cases).  On a 32-bit platform, a long is a 4-byte quantity and the file 
descriptor is already aligned to a 4-byte quantity by placing it after the cmsghdr structure.  On a 64-byte platform, 
however, a long is an 8-byte quantity, and CMSG_SPACE() assumes that len is aligned on an 8-byte boundary, which isn't a 
requirement for passing file descriptors (which are 4-byte quantities; see scm_fp_copy() to verify).  Anyway, the end 
result is that recvmsg(2) returns with msg_controllen 4 bytes larger than it should be on a 64-bit platform.  The 
attached program prints out a warning message when this happens.

I've tried this on kernels as old as 2.6.18, so it appears this bug has been around for a while.  I originally found it 
on a 3.2.0 kernel.  Michael verified it still exists on a 3.11 kernel.

Let me know if you need any more information

Steve Rago
sar@nec-labs.com

[-- Attachment #2: passfdtest.c --]
[-- Type: text/x-csrc, Size: 2641 bytes --]

#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <fcntl.h>
#include <string.h>
#include <errno.h>
#include <sys/socket.h>
#include <sys/stat.h>

void
sendfd(int sockfd, int fd)
{
	char dummybuf;
	struct iovec iov;
	struct msghdr mh;
	struct cmsghdr *cmp = malloc(CMSG_LEN(sizeof(int)));

	if (cmp == NULL) {
		fprintf(stderr, "can't malloc: %s\n", strerror(errno));
		exit(1);
	}
	iov.iov_base = &dummybuf;
	iov.iov_len = 1;
	mh.msg_iov = &iov;
	mh.msg_iovlen = 1;
	mh.msg_name = NULL;
	mh.msg_namelen = 0;
	mh.msg_control = cmp;
	mh.msg_controllen = CMSG_LEN(sizeof(int));
	cmp->cmsg_level = SOL_SOCKET;
	cmp->cmsg_type = SCM_RIGHTS;
	cmp->cmsg_len = CMSG_LEN(sizeof(int));
	*(int *)CMSG_DATA(cmp) = fd;
	if (sendmsg(sockfd, &mh, 0) < 0) {
		fprintf(stderr, "sendmsg failed: %s\n", strerror(errno));
		exit(1);
	}
	free(cmp);
}

int
recvfd(int sockfd)
{
	struct msghdr mh;
	int nfd;
	char dummybuf;
	struct iovec iov;
	long csz = CMSG_LEN(sizeof(int));
	struct cmsghdr *cmp = malloc(csz);

	if (cmp == NULL) {
		fprintf(stderr, "can't malloc: %s\n", strerror(errno));
		exit(1);
	}
	iov.iov_base = &dummybuf;
	iov.iov_len = 1;
	mh.msg_iov = &iov;
	mh.msg_iovlen = 1;
	mh.msg_name = NULL;
	mh.msg_namelen = 0;
	mh.msg_control = cmp;
	mh.msg_controllen = csz;
	if (recvmsg(sockfd, &mh, 0) < 0) {
		fprintf(stderr, "recvmsg failed: %s\n", strerror(errno));
		exit(1);
	}
	if (mh.msg_controllen == 0) {
		fprintf(stderr, "no control data in message\n");
		exit(1);
	}
	if (mh.msg_controllen != csz)
		fprintf(stderr, "WARNING: controllen %ld, should be %ld\n", mh.msg_controllen, csz);
	nfd = *(int *)CMSG_DATA(cmp);
	free(cmp);
	return(nfd);
}

int
main()
{
	int nfd, nfd2;
	int fd[2];
	struct stat sbuf[2];

	printf("sizeof(int) = %d\n", (int)sizeof(int));
	printf("sizeof(long) = %d\n", (int)sizeof(long));
	printf("sizeof(size_t) = %d\n", (int)sizeof(size_t));
	printf("CMSG_LEN(sizeof(int)) = %d\n", (int)CMSG_LEN(sizeof(int)));
	printf("CMSG_SPACE(sizeof(int)) = %d\n", (int)CMSG_SPACE(sizeof(int)));
	if (socketpair(AF_UNIX, SOCK_STREAM, 0, fd) < 0) {
		fprintf(stderr, "can't create socket pair: %s\n", strerror(errno));
		exit(1);
	}
	if ((nfd = open("/etc/services", O_RDONLY)) < 0) {
		fprintf(stderr, "can't open /etc/services: %s", strerror(errno));
		exit(1);
	}
	sendfd(fd[0], nfd);
	nfd2 = recvfd(fd[1]);
	if (fstat(nfd, &sbuf[0]) < 0 || fstat(nfd2, &sbuf[1]) < 0) {
		fprintf(stderr, "fstat failed: %s\n", strerror(errno));
		exit(1);
	}
	if (sbuf[0].st_dev == sbuf[1].st_dev && sbuf[0].st_ino == sbuf[1].st_ino)
		printf("the files are the same\n");
	else
		printf("the files are different\n");
	exit(0);
}

^ permalink raw reply

* I need your urgent cooperation.
From: Madina Yakou @ 2013-10-07 16:31 UTC (permalink / raw)



Dear Friend,

I am contacting you to assist me in a fund transfer of $6,200,000.00 US dollars into your personal bank account for the benefit of both of us, this transaction is confidential and we MUST remain fiducially in all our dealings since I cannot execute this opportunity alone without the support of a foreigner. This is in line with the Laws governing my Bank and that is why I have contacted you in this manner.

For your role in assisting me actualize this opportunity, you will be entitled to 30% of the total sum while 70% will be for me. However, what I need is your maximum cooperation and to provide a valid bank account where my bank will transfer this fund for our benefit. By indicating your interest on assurance of trust I will send you the full details and how this business will be executed and your urgent response to this proposal will be highly appreciated.

Best regards,
Mrs. Madina.

^ permalink raw reply

* Re: bug in passing file descriptors
From: Steve Rago @ 2013-10-07 19:06 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Network Development, David Miller, Michael Kerrisk-manpages,
	Eric Biederman
In-Reply-To: <CALCETrXmmW-nRTr4-yWJnNr2GWW37KW-7B1m=kSRnmYjhMTajA@mail.gmail.com>

On 10/07/2013 02:44 PM, Andy Lutomirski wrote:
>
> ISTM that, in order for further cmsgs to be correctly decoded, all of
> the relevant things need to match.
>
> put_cmsg uses this layout: cmsghdr, padding, payload, padding.
> CMSG_SPACE matches that calculation.
>
> scm_detach_fds is the actual code path for SCM_RIGHTS.  It does the same thing.
>
> CMSG_DATA also things that there's possible padding after cmsghdr.
>
> So I think everything's okay.
>
> --Andy
>

Maybe.  So a client expecting to receive x bytes of control information should make sure their buffer is at least 
CMSG_SPACE(x) bytes long instead of CMSG_LEN(x) bytes long, because you feel compelled to copy the final padding from 
kernel space to user space?  Seems wrong to me.  IMHO, the final padding should only come into play when calculating 
where the next header should begin.

Steve

^ permalink raw reply

* Re: [PATCH] Fix the upper MTU limit in ipv6 GRE tunnel
From: David Miller @ 2013-10-07 18:48 UTC (permalink / raw)
  To: ou.ghorbel; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel
In-Reply-To: <CABfLueFg=tKUbZPvKFKJZviNE+x1K4akjY8aPX7otyf_9gkLQQ@mail.gmail.com>

From: Oussama Ghorbel <ou.ghorbel@gmail.com>
Date: Mon, 7 Oct 2013 18:52:15 +0100

> Sorry for that. I've added  it and I have resubmitted the patch.

Thank you.

^ permalink raw reply

* Re: bug in passing file descriptors
From: Andy Lutomirski @ 2013-10-07 18:44 UTC (permalink / raw)
  To: Steve Rago
  Cc: Network Development, David Miller, Michael Kerrisk-manpages,
	Eric Biederman
In-Reply-To: <5252FD2B.5040800@nec-labs.com>

On Mon, Oct 7, 2013 at 11:27 AM, Steve Rago <sar@nec-labs.com> wrote:
> Sending this to a larger group at mtk's suggestion.
>
> I recently found a bug with passing file descriptors over Unix domain
> sockets.  The attached program illustrates the problem.  I believe the
> source of the problem is in net/core/scm.c.  In put_cmsg(), cmlen is
> calculated as CMSG_SPACE(len) for the purposes of setting msg_controllen,
> but it probably should be CMSG_LEN(len), at least in this particular case (I
> didn't investigate other use cases).  On a 32-bit platform, a long is a
> 4-byte quantity and the file descriptor is already aligned to a 4-byte
> quantity by placing it after the cmsghdr structure.  On a 64-byte platform,
> however, a long is an 8-byte quantity, and CMSG_SPACE() assumes that len is
> aligned on an 8-byte boundary, which isn't a requirement for passing file
> descriptors (which are 4-byte quantities; see scm_fp_copy() to verify).
> Anyway, the end result is that recvmsg(2) returns with msg_controllen 4
> bytes larger than it should be on a 64-bit platform.  The attached program
> prints out a warning message when this happens.
>
> I've tried this on kernels as old as 2.6.18, so it appears this bug has been
> around for a while.  I originally found it on a 3.2.0 kernel.  Michael
> verified it still exists on a 3.11 kernel.
>
> Let me know if you need any more information

ISTM that, in order for further cmsgs to be correctly decoded, all of
the relevant things need to match.

put_cmsg uses this layout: cmsghdr, padding, payload, padding.
CMSG_SPACE matches that calculation.

scm_detach_fds is the actual code path for SCM_RIGHTS.  It does the same thing.

CMSG_DATA also things that there's possible padding after cmsghdr.

So I think everything's okay.

--Andy

^ permalink raw reply

* Re: [PATCH RFC 54/77] ntb: Ensure number of MSIs on SNB is enough for the link interrupt
From: Alexander Gordeev @ 2013-10-07 18:38 UTC (permalink / raw)
  To: Jon Mason
  Cc: linux-kernel, Bjorn Helgaas, Ralf Baechle, Michael Ellerman,
	Benjamin Herrenschmidt, Martin Schwidefsky, Ingo Molnar,
	Tejun Heo, Dan Williams, Andy King, Matt Porter, stable,
	linux-pci, linux-mips, linuxppc-dev, linux390, linux-s390, x86,
	linux-ide, iss_storagedev, linux-nvme, linux-rdma, netdev,
	e1000-devel, linux-driver
In-Reply-To: <20131007165056.GA24536@jonmason-lab>

On Mon, Oct 07, 2013 at 09:50:57AM -0700, Jon Mason wrote:
> On Sat, Oct 05, 2013 at 11:43:04PM +0200, Alexander Gordeev wrote:
> > On Wed, Oct 02, 2013 at 05:48:05PM -0700, Jon Mason wrote:
> > > On Wed, Oct 02, 2013 at 12:49:10PM +0200, Alexander Gordeev wrote:
> > > > Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
> > > > ---
> > > >  drivers/ntb/ntb_hw.c |    2 +-
> > > >  1 files changed, 1 insertions(+), 1 deletions(-)
> > > > 
> > > > diff --git a/drivers/ntb/ntb_hw.c b/drivers/ntb/ntb_hw.c
> > > > index de2062c..eccd5e5 100644
> > > > --- a/drivers/ntb/ntb_hw.c
> > > > +++ b/drivers/ntb/ntb_hw.c
> > > > @@ -1066,7 +1066,7 @@ static int ntb_setup_msix(struct ntb_device *ndev)
> > > >  		/* On SNB, the link interrupt is always tied to 4th vector.  If
> > > >  		 * we can't get all 4, then we can't use MSI-X.
> > > >  		 */
> > > > -		if (ndev->hw_type != BWD_HW) {
> > > > +		if ((rc < SNB_MSIX_CNT) && (ndev->hw_type != BWD_HW)) {
> > > 
> > > Nack, this check is unnecessary.
> > 
> > If SNB can do more than SNB_MSIX_CNT MSI-Xs then this check is needed
> > to enable less than maximum MSI-Xs in case the maximum was not allocated.
> > Otherwise SNB will fallback to single MSI instead of multiple MSI-Xs.
> 
> Per the comment in the code snippet above, "If we can't get all 4,
> then we can't use MSI-X".  There is already a check to see if more
> than 4 were acquired.  So it's not possible to hit this.  Even if it
> was, don't use SNB_MSIX_CNT here (limits.msix_cnt is the preferred
> variable).  Also, the "()" are unnecessary.

The changelog is definitely bogus. I meant here an improvement to the
existing scheme, not a conversion to the new one:

	msix_entries = msix_table_size(val);

Getting i.e. 16 vectors here.

	if (msix_entries > ndev->limits.msix_cnt) {
		rc = -EINVAL;
		goto err;
	}

Upper limit check i.e. succeeds.

	[...]

	rc = pci_enable_msix(pdev, ndev->msix_entries, msix_entries);

pci_enable_msix() does not success and returns i.e. 8 here, should retry.

	if (rc < 0)
		goto err1;
	if (rc > 0) {
		/* On SNB, the link interrupt is always tied to 4th vector.  If
		 * we can't get all 4, then we can't use MSI-X.
		 */
		if (ndev->hw_type != BWD_HW) {

On SNB bail out here, although could have continue with 8 vectors.
Can only use SNB_MSIX_CNT here, since limits.msix_cnt is the upper limit.

			rc = -EIO;
			goto err1;
		}

		[...]
	}

-- 
Regards,
Alexander Gordeev
agordeev@redhat.com

^ permalink raw reply

* [raw v1 2/4] [NET] Use raw_cpu ops for SNMP stats
From: Christoph Lameter @ 2013-10-07 18:31 UTC (permalink / raw)
  To: Tejun Heo
  Cc: akpm, Eric Dumazet, netdev, Steven Rostedt, linux-kernel,
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner
In-Reply-To: <20131007183226.334180014@linux.com>

SNMP stats are not protected by preemption but by bh handling.
Since protection is provided outside of preemption raw_cpu_ops
need to be used to avoid false positives.

Cc: Eric Dumazet <edumazet@google.com>
CC: netdev@vger.kernel.org
Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/include/net/snmp.h
===================================================================
--- linux.orig/include/net/snmp.h	2013-10-07 09:16:07.595206864 -0500
+++ linux/include/net/snmp.h	2013-10-07 09:16:07.591206909 -0500
@@ -126,7 +126,7 @@ struct linux_xfrm_mib {
 	extern __typeof__(type) __percpu *name[SNMP_ARRAY_SZ]
 
 #define SNMP_INC_STATS_BH(mib, field)	\
-			__this_cpu_inc(mib[0]->mibs[field])
+			raw_cpu_inc(mib[0]->mibs[field])
 
 #define SNMP_INC_STATS_USER(mib, field)	\
 			this_cpu_inc(mib[0]->mibs[field])
@@ -141,7 +141,7 @@ struct linux_xfrm_mib {
 			this_cpu_dec(mib[0]->mibs[field])
 
 #define SNMP_ADD_STATS_BH(mib, field, addend)	\
-			__this_cpu_add(mib[0]->mibs[field], addend)
+			raw_cpu_add(mib[0]->mibs[field], addend)
 
 #define SNMP_ADD_STATS_USER(mib, field, addend)	\
 			this_cpu_add(mib[0]->mibs[field], addend)

^ permalink raw reply

* Re: IPv6 kernel warning
From: dormando @ 2013-10-07 18:13 UTC (permalink / raw)
  To: Yuchung Cheng; +Cc: Michele Baldessari, Russell King - ARM Linux, netdev
In-Reply-To: <CAK6E8=eNFG0fic+NYeQBjEoUQMeZKSAT3LU0JSQyE-4i_cDaZQ@mail.gmail.com>

> >
> > there's been multiple reports about this one:
> > https://bugzilla.redhat.com/show_bug.cgi?id=989251
> > http://bugzilla.kernel.org/show_bug.cgi?id=60779
> >
> > Could you try Yuchung's debug patch?
> > http://www.spinics.net/lists/netdev/msg250193.html
> Yes it looks like the same bug. Please try that patch to help identify
> this elusive bug.
>

Hi!

We get this one a few times a day in production. Here's a warning with
your debug trace in the line immediately following:
(I censored a few things)

 [125311.721950] ------------[ cut here ]------------
 [125311.721961] WARNING: at net/ipv4/tcp_input.c:2776 tcp_fastretrans_alert+0xb58/0xc80()
 [125311.721962] Modules linked in: bridge ip_vs macvlan coretemp crc32_pclmul ghash_clmulni_intel gpio_ich ipmi_watchdog microcode ipmi_devintf sb_edac lpc_ich edac_core mfd_core ipmi_si ipmi_msghandler iptable_nat nf_nat_ipv4 nf_nat ixgbe igb mdio i2c_algo_bit ptp pps_core
 [125311.721981] CPU: 11 PID: 0 Comm: swapper/11 Not tainted 3.10.13 #1
 [125311.721982] Hardware name: Supermicro XXXXXXXXXXX, BIOS 1.1 10/03/2012
 [125311.721984]  ffffffff81a82007 ffff88407fc63958 ffffffff816bb9cc ffff88407fc63998
 [125311.721986]  ffffffff8104b940 00ff8840ad904f82 ffff883b8a165b00 0000000000004120
 [125311.721989]  0000000000000001 0000000000000019 0000000000000000 ffff88407fc639a8
 [125311.721991] Call Trace:
 [125311.721992]  <IRQ>  [<ffffffff816bb9cc>] dump_stack+0x19/0x1d
 [125311.722002]  [<ffffffff8104b940>] warn_slowpath_common+0x70/0xa0
 [125311.722005]  [<ffffffff8104b98a>] warn_slowpath_null+0x1a/0x20
 [125311.722007]  [<ffffffff81616db8>] tcp_fastretrans_alert+0xb58/0xc80
 [125311.722011]  [<ffffffff8161891f>] tcp_ack+0x6df/0xe90
 [125311.722016]  [<ffffffff8164e0ca>] ? ipt_do_table+0x22a/0x680
 [125311.722018]  [<ffffffff816194b3>] ? tcp_validate_incoming+0x63/0x320
 [125311.722021]  [<ffffffff8161a55c>] tcp_rcv_established+0x2cc/0x810
 [125311.722023]  [<ffffffff81622c84>] tcp_v4_do_rcv+0x254/0x4f0
 [125311.722025]  [<ffffffff816245ac>] tcp_v4_rcv+0x5fc/0x750
 [125311.722027]  [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350
 [125311.722032]  [<ffffffff815df3ad>] ? nf_hook_slow+0x7d/0x160
 [125311.722034]  [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350
 [125311.722036]  [<ffffffff815fface>] ip_local_deliver_finish+0xce/0x250
 [125311.722037]  [<ffffffff815ffc9c>] ip_local_deliver+0x4c/0x80
 [125311.722039]  [<ffffffff815ff329>] ip_rcv_finish+0x119/0x360
 [125311.722040]  [<ffffffff815ff8e0>] ip_rcv+0x230/0x350
 [125311.722046]  [<ffffffff815b4067>] __netif_receive_skb_core+0x477/0x600
 [125311.722049]  [<ffffffff815b4217>] __netif_receive_skb+0x27/0x70
 [125311.722051]  [<ffffffff815b4354>] process_backlog+0xf4/0x1e0
 [125311.722053]  [<ffffffff815b4b45>] net_rx_action+0xf5/0x250
 [125311.722056]  [<ffffffff81053a5f>] __do_softirq+0xef/0x270
 [125311.722058]  [<ffffffff81053cb5>] irq_exit+0x95/0xa0
 [125311.722062]  [<ffffffff816c8f26>] do_IRQ+0x66/0xe0
 [125311.722065]  [<ffffffff816bf62a>] common_interrupt+0x6a/0x6a
 [125311.722065]  <EOI>  [<ffffffff8100abf1>] ? default_idle+0x21/0xc0
 [125311.722082]  [<ffffffff8100a54f>] arch_cpu_idle+0xf/0x20
 [125311.722086]  [<ffffffff8108f353>] cpu_startup_entry+0xb3/0x230
 [125311.722091]  [<ffffffff816b439e>] start_secondary+0x1dc/0x1e3
 [125311.722093] ---[ end trace e77cd5ba583fcbe9 ]---
 [125311.722096] 355.355.1.355:22496 F0x4120 S1 s7 IF25+17-1-24f0 ur57 rr3 rt0 um0 hs23120 nxt23120

It's been happening with all 3.10 kernels, and the one above is .13 as
stated in the trace.

^ permalink raw reply

* Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern
From: Tejun Heo @ 2013-10-07 18:21 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, Bjorn Helgaas, Ralf Baechle, Michael Ellerman,
	Benjamin Herrenschmidt, Martin Schwidefsky, Ingo Molnar,
	Dan Williams, Andy King, Jon Mason, Matt Porter, linux-pci,
	linux-mips, linuxppc-dev, linux390, linux-s390, x86, linux-ide,
	iss_storagedev, linux-nvme, linux-rdma, netdev, e1000-devel,
	linux-driver, Solarflare linux maintainers, VMware, Inc.,
	linux-sc
In-Reply-To: <cover.1380703262.git.agordeev@redhat.com>

Hello, Alexander.

On Wed, Oct 02, 2013 at 12:48:16PM +0200, Alexander Gordeev wrote:
> Alexander Gordeev (77):
>   PCI/MSI: Fix return value when populate_msi_sysfs() failed
>   PCI/MSI/PPC: Fix wrong RTAS error code reporting
>   PCI/MSI/s390: Fix single MSI only check
>   PCI/MSI/s390: Remove superfluous check of MSI type
>   PCI/MSI: Convert pci_msix_table_size() to a public interface
>   PCI/MSI: Factor out pci_get_msi_cap() interface
>   PCI/MSI: Re-design MSI/MSI-X interrupts enablement pattern
>   PCI/MSI: Get rid of pci_enable_msi_block_auto() interface
>   ahci: Update MSI/MSI-X interrupts enablement code
>   ahci: Check MRSM bit when multiple MSIs enabled
...

Whee.... that's a lot more than I expected.  I was just scanning
multiple msi users.  Maybe we can stage the work in more manageable
steps so that you don't have to go through massive conversion only to
do it all over again afterwards and likewise people don't get
bombarded on each iteration?  Maybe we can first update pci / msi code
proper, msi and then msix?

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH RFC 07/77] PCI/MSI: Re-design MSI/MSI-X interrupts enablement pattern
From: Tejun Heo @ 2013-10-07 18:17 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, Bjorn Helgaas, Ralf Baechle, Michael Ellerman,
	Benjamin Herrenschmidt, Martin Schwidefsky, Ingo Molnar,
	Dan Williams, Andy King, Jon Mason, Matt Porter, linux-pci,
	linux-doc, linux-mips, linuxppc-dev, linux390, linux-s390, x86,
	linux-ide, iss_storagedev, linux-nvme, linux-rdma, netdev,
	e1000-devel, linux-driver, Solarflare linux maintainers,
	"VMware, Inc." <pv-dr
In-Reply-To: <d8c36203ada6efbfa9f7ce92c2f713ee3b6d6b8d.1380703262.git.agordeev@redhat.com>

Hello,

On Wed, Oct 02, 2013 at 12:48:23PM +0200, Alexander Gordeev wrote:
> +static int foo_driver_enable_msi(struct foo_adapter *adapter, int nvec)
> +{
> +	rc = pci_get_msi_cap(adapter->pdev);
> +	if (rc < 0)
> +		return rc;
> +
> +	nvec = min(nvec, rc);
> +	if (nvec < FOO_DRIVER_MINIMUM_NVEC) {
> +		return -ENOSPC;
> +
> +	rc = pci_enable_msi_block(adapter->pdev, nvec);
> +	return rc;
> +}

If there are many which duplicate the above pattern, it'd probably be
worthwhile to provide a helper?  It's usually a good idea to reduce
the amount of boilerplate code in drivers.

>  static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec)
>  {
> +	rc = pci_msix_table_size(adapter->pdev);
> +	if (rc < 0)
> +		return rc;
> +
> +	nvec = min(nvec, rc);
> +	if (nvec < FOO_DRIVER_MINIMUM_NVEC) {
> +		return -ENOSPC;
> +
> +	for (i = 0; i < nvec; i++)
> +		adapter->msix_entries[i].entry = i;
> +
> +	rc = pci_enable_msix(adapter->pdev, adapter->msix_entries, nvec);
> +	return rc;
>  }

Ditto.

> @@ -975,7 +951,7 @@ int pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries, int nvec)
>  	if (nr_entries < 0)
>  		return nr_entries;
>  	if (nvec > nr_entries)
> -		return nr_entries;
> +		return -EINVAL;
>  
>  	/* Check for any invalid entries */
>  	for (i = 0; i < nvec; i++) {

If we do things this way, it breaks all drivers using this interface
until they're converted, right?  Also, it probably isn't the best idea
to flip the behavior like this as this can go completely unnoticed (no
compiler warning or anything, the same function just behaves
differently).  Maybe it'd be a better idea to introduce a simpler
interface that most can be converted to?

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH RFC 05/77] PCI/MSI: Convert pci_msix_table_size() to a public interface
From: Tejun Heo @ 2013-10-07 18:10 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Bjorn Helgaas, Ralf Baechle,
	Michael Ellerman, Benjamin Herrenschmidt, Martin Schwidefsky,
	Ingo Molnar, Dan Williams, Andy King, Jon Mason, Matt Porter,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-mips-6z/3iImG2C8G8FEW9MqTrA,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux390-tA70FqPdS9bQT0dZR+AlfA,
	linux-s390-u79uwXL29TY76Z2rM5mHXA, x86-DgEjT+Ai2ygdnm+yROfE0A,
	linux-ide-u79uwXL29TY76Z2rM5mHXA, iss_storagedev-VXdhtT5mjnY,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	e1000-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-driver-h88ZbnxC6KDQT0dZR+AlfA, Solarflare linux maintainers,
	VMware, Inc., linux-sc
In-Reply-To: <e8b51bd48c24d0fc4ee8adea5c138c9bf84191e9.1380703262.git.agordeev-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Hello,

On Wed, Oct 02, 2013 at 12:48:21PM +0200, Alexander Gordeev wrote:
> Make pci_msix_table_size() to return a error code if the device
> does not support MSI-X. This update is needed to facilitate a
> forthcoming re-design MSI/MSI-X interrupts enabling pattern.
> 
> Device drivers will use this interface to obtain maximum number
> of MSI-X interrupts the device supports and use that value in
> the following call to pci_enable_msix() interface.
> 
> Signed-off-by: Alexander Gordeev <agordeev-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Hmmm... I probably missed something but why is this necessary?  To
discern between -EINVAL and -ENOSPC?  If so, does that really matter?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern
From: Tejun Heo @ 2013-10-07 18:01 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Benjamin Herrenschmidt, Ben Hutchings,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Bjorn Helgaas, Ralf Baechle,
	Michael Ellerman, Martin Schwidefsky, Ingo Molnar, Dan Williams,
	Andy King, Jon Mason, Matt Porter,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-mips-6z/3iImG2C8G8FEW9MqTrA,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux390-tA70FqPdS9bQT0dZR+AlfA,
	linux-s390-u79uwXL29TY76Z2rM5mHXA, x86-DgEjT+Ai2ygdnm+yROfE0A,
	linux-ide-u79uwXL29TY76Z2rM5mHXA, iss_storagedev-VXdhtT5mjnY,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	e1000-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-driver-h88ZbnxC6KDQT0dZR+AlfA, Solarflare linux maintainers,
	"VMw
In-Reply-To: <20131006071027.GA29143-hdGaXg0bp3uRXgp2RCiI5R/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>

Hey, guys.

On Sun, Oct 06, 2013 at 09:10:30AM +0200, Alexander Gordeev wrote:
> On Sun, Oct 06, 2013 at 05:19:46PM +1100, Benjamin Herrenschmidt wrote:
> > On Sun, 2013-10-06 at 08:02 +0200, Alexander Gordeev wrote:
> > > In fact, in the current design to address the quota race decently the
> > > drivers would have to protect the *loop* to prevent the quota change
> > > between a pci_enable_msix() returned a positive number and the the next
> > > call to pci_enable_msix() with that number. Is it doable?
> > 
> > I am not advocating for the current design, simply saying that your
> > proposal doesn't address this issue while Ben's does.

Hmmm... yean, the race condition could be an issue as multiple msi
allocation might fail even if the driver can and explicitly handle
multiple allocation if the quota gets reduced inbetween.

> There is one major flaw in min-max approach - the generic MSI layer
> will have to take decisions on exact number of MSIs to request, not
> device drivers.

The min-max approach would actually be pretty nice for the users which
actually care about this.

> This will never work for all devices, because there might be specific
> requirements which are not covered by the min-max. That is what Ben
> described "...say, any even number within a certain range". Ben suggests
> to leave the existing loop scheme to cover such devices, which I think is
> not right.

if it could work.

> What about introducing pci_lock_msi() and pci_unlock_msi() and let device
> drivers care about their ranges and specifics in race-safe manner?
> I do not call to introduce it right now (since it appears pSeries has not
> been hitting the race for years) just as a possible alternative to Ben's
> proposal.

I don't think the same race condition would happen with the loop.  The
problem case is where multiple msi(x) allocation fails completely
because the global limit went down before inquiry and allocation.  In
the loop based interface, it'd retry with the lower number.

As long as the number of drivers which need this sort of adaptive
allocation isn't too high and the common cases can be made simple, I
don't think the "complex" part of interface is all that important.
Maybe we can have reserve / cancel type interface or just keep the
loop with more explicit function names (ie. try_enable or something
like that).

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] Fix the upper MTU limit in ipv6 GRE tunnel
From: Oussama Ghorbel @ 2013-10-07 17:59 UTC (permalink / raw)
  To: Oussama Ghorbel, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, linux-kernel
In-Reply-To: <20131007020218.GF9295@order.stressinduktion.org>

On Mon, Oct 7, 2013 at 3:02 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On Sun, Oct 06, 2013 at 08:18:15PM +0100, Oussama Ghorbel wrote:
>> Yes, to summarize, the idea of this patch was to fix the incoherence
>> in the condition of ip6gre_tunnel_change_mtu function
>>
>>   if (new_mtu < 68 ||
>>    new_mtu > 0xFFF8 - dev->hard_header_len - tunnel->hlen)
>>
>> From the ip6gre_tnl_link_config function we can see that:
>> The variable addend is equal the ipv6 header + gre header (including
>> the gre options)
>> On the other hand hard_header_len equal to the header of the lower
>> layer + addend.
>> So the quantity - (dev->hard_header_len + tunnel->hlen) equals - (eth
>> header + ipv6 header + gre header + ipv6 header + gre header) which by
>> no means this would represent anything!  (I've just taken ipv6 over
>> ethernet as example)
>>
>> As we have seen there is another approach to fix this issue is to
>> re-factor the hlen to hold only the length of gre as it's done for
>> ipv4 gre, however the solution provided in the patch seems to be
>> regression risk-less.
>
> I agree, it actually does not worsen the situation:
>
> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
>
>> Although the value hold by hlen is not coherent with the variable name
>> nor with ipv4, I think there is an advantage of the current approach
>> of ipv6 hlen over ipv4 hlen, because we save the calculation of ipv6
>> header each time. Ex:
>> In ipv4 gre and in the function ipgre_header:
>> iph = (struct iphdr *)skb_push(skb, t->hlen + sizeof(*iph));
>> In ipv6 and in the function ip6gre_header
>> ipv6h = (struct ipv6hdr *)skb_push(skb, t->hlen);
>
> I see your point. But we should take care that t->hlen is always initialized,
> regardless if we got a route and outgoing device or not.
>

OK, I'll investigate on this and I'll open a dedicated thread mail/send patch...

> Greetings,
>
>   Hannes
>
Thanks,
Oussama

^ permalink raw reply

* Re: [PATCH] Fix the upper MTU limit in ipv6 GRE tunnel
From: Oussama Ghorbel @ 2013-10-07 17:52 UTC (permalink / raw)
  To: David Miller
  Cc: Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy, netdev, linux-kernel
In-Reply-To: <20131007.133849.1821189531065820610.davem@davemloft.net>

Sorry for that. I've added  it and I have resubmitted the patch.

Thanks

On Mon, Oct 7, 2013 at 6:38 PM, David Miller <davem@davemloft.net> wrote:
> From: Oussama Ghorbel <ou.ghorbel@gmail.com>
> Date: Mon, 7 Oct 2013 18:34:53 +0100
>
>> OK, I've resubmitted the patch with the proper Subject line.
>> The new mail subject is: [PATCH] ipv6: Fix the upper MTU limit in GRE tunnel
>
> You forgot to propagate the "Acked-by: " tag that was given by
> reviewers of your patch.

^ permalink raw reply

* [PATCH] ipv6: Fix the upper MTU limit in GRE tunnel
From: Oussama Ghorbel @ 2013-10-07 17:50 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy
  Cc: netdev, linux-kernel, Oussama Ghorbel

Unlike ipv4, the struct member hlen holds the length of the GRE and ipv6
headers. This length is also counted in dev->hard_header_len.
Perhaps, it's more clean to modify the hlen to count only the GRE header
without ipv6 header as the variable name suggest, but the simple way to fix
this without regression risk is simply modify the calculation of the limit
in ip6gre_tunnel_change_mtu function.
Verified in kernel version v3.11.

Signed-off-by: Oussama Ghorbel <ou.ghorbel@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv6/ip6_gre.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 90747f1..41487ab 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -1175,9 +1175,8 @@ done:
 
 static int ip6gre_tunnel_change_mtu(struct net_device *dev, int new_mtu)
 {
-	struct ip6_tnl *tunnel = netdev_priv(dev);
 	if (new_mtu < 68 ||
-	    new_mtu > 0xFFF8 - dev->hard_header_len - tunnel->hlen)
+	    new_mtu > 0xFFF8 - dev->hard_header_len)
 		return -EINVAL;
 	dev->mtu = new_mtu;
 	return 0;
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH] Fix the upper MTU limit in ipv6 GRE tunnel
From: David Miller @ 2013-10-07 17:38 UTC (permalink / raw)
  To: ou.ghorbel; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel
In-Reply-To: <CABfLueGqa8wJv+hfcNQRmWM2tj6zMC5Ajts=-no7GJA1t09dnQ@mail.gmail.com>

From: Oussama Ghorbel <ou.ghorbel@gmail.com>
Date: Mon, 7 Oct 2013 18:34:53 +0100

> OK, I've resubmitted the patch with the proper Subject line.
> The new mail subject is: [PATCH] ipv6: Fix the upper MTU limit in GRE tunnel

You forgot to propagate the "Acked-by: " tag that was given by
reviewers of your patch.

^ permalink raw reply

* Re: [PATCH] Fix the upper MTU limit in ipv6 GRE tunnel
From: Oussama Ghorbel @ 2013-10-07 17:34 UTC (permalink / raw)
  To: David Miller
  Cc: Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy, netdev, linux-kernel
In-Reply-To: <20131007.125326.895480090584703586.davem@davemloft.net>

OK, I've resubmitted the patch with the proper Subject line.
The new mail subject is: [PATCH] ipv6: Fix the upper MTU limit in GRE tunnel


Thanks,
Oussama

On Mon, Oct 7, 2013 at 5:53 PM, David Miller <davem@davemloft.net> wrote:
> From: Oussama Ghorbel <ou.ghorbel@gmail.com>
> Date: Fri,  4 Oct 2013 10:52:13 +0100
>
>> Unlike ipv4, the struct member hlen holds the length of the GRE and ipv6
>> headers. This length is also counted in dev->hard_header_len.
>> Perhaps, it's more clean to modify the hlen to count only the GRE header
>> without ipv6 header as the variable name suggest, but the simple way to fix
>> this without regression risk is simply modify the calculation of the limit
>> in ip6gre_tunnel_change_mtu function.
>> Verified in kernel version v3.11.
>>
>> Signed-off-by: Oussama Ghorbel <ou.ghorbel@gmail.com>
>
> Please resubmit this with a proper Subject line.
>
> It should be "[PATCH] " then a legitimate subsystem prefix.
> In this case "ipv6: " would be appropriate.  And then
> the "ipv6" in your existing Subject text is redundant so
> can be removed.
>
> Thanks.

^ permalink raw reply

* [PATCH] ipv6: Fix the upper MTU limit in GRE tunnel
From: Oussama Ghorbel @ 2013-10-07 17:29 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy
  Cc: netdev, linux-kernel, Oussama Ghorbel

Unlike ipv4, the struct member hlen holds the length of the GRE and ipv6
headers. This length is also counted in dev->hard_header_len.
Perhaps, it's more clean to modify the hlen to count only the GRE header
without ipv6 header as the variable name suggest, but the simple way to fix
this without regression risk is simply modify the calculation of the limit
in ip6gre_tunnel_change_mtu function.
Verified in kernel version v3.11.

Signed-off-by: Oussama Ghorbel <ou.ghorbel@gmail.com>
---
 net/ipv6/ip6_gre.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 90747f1..41487ab 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -1175,9 +1175,8 @@ done:
 
 static int ip6gre_tunnel_change_mtu(struct net_device *dev, int new_mtu)
 {
-	struct ip6_tnl *tunnel = netdev_priv(dev);
 	if (new_mtu < 68 ||
-	    new_mtu > 0xFFF8 - dev->hard_header_len - tunnel->hlen)
+	    new_mtu > 0xFFF8 - dev->hard_header_len)
 		return -EINVAL;
 	dev->mtu = new_mtu;
 	return 0;
-- 
1.7.9.5

^ permalink raw reply related

* Re: Neterion and UFO handling [was: Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO]
From: Hannes Frederic Sowa @ 2013-10-07 17:27 UTC (permalink / raw)
  To: Jon Mason
  Cc: jdmason, Jiri Pirko, netdev, yoshfuji, davem, kuznet, jmorris,
	kaber, herbert, eric.dumazet
In-Reply-To: <20131007171941.GB24536@jonmason-lab>

On Mon, Oct 07, 2013 at 10:19:41AM -0700, Jon Mason wrote:
> Both are valid email addresses, but I prefer to address non-Intel
> issues with my kudzu.us email account.

Ok.

> I apologize for not addressing your question yet.  What you are saying
> makes sense, but I want to dig through the documentation and verify.
> However, I haven't had the time.  I'll brew up a pot of coffee when I
> get home and I'll get an answer to you before I go to bed tonight :)

No need to hurry, I do not want to induce stress. ;)

I just wanted to make sure this does not get forgotten so we can apply Jiri's
patches anytime soon.

Enjoy your coffee,

  Hannes

^ permalink raw reply

* Re: Neterion and UFO handling [was: Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO]
From: Jon Mason @ 2013-10-07 17:19 UTC (permalink / raw)
  To: jdmason, Jiri Pirko, netdev, yoshfuji, davem, kuznet, jmorris,
	kaber, herbert, eric.dumazet
In-Reply-To: <20131007165343.GI9295@order.stressinduktion.org>

On Mon, Oct 07, 2013 at 06:53:43PM +0200, Hannes Frederic Sowa wrote:
> Hi Jon!
> 
> Maybe I got the wrong email address for the neterion driver from the
> maintainers filer? If you are (still) affiliated with the neterion driver,
> maybe you could have a short look at the quoted mail below?

Both are valid email addresses, but I prefer to address non-Intel
issues with my kudzu.us email account.

I apologize for not addressing your question yet.  What you are saying
makes sense, but I want to dig through the documentation and verify.
However, I haven't had the time.  I'll brew up a pot of coffee when I
get home and I'll get an answer to you before I go to bed tonight :)

Thanks,
Jon

> 
> Thanks,
> 
>   Hannes
> 
> On Wed, Oct 02, 2013 at 06:27:30PM +0200, Hannes Frederic Sowa wrote:
> > Hi!
> > 
> > I have a question regarding UFO and the neterion driver, which as the only one
> > advertises hardware UFO support:
> > 
> > The patch discusses in this thread
> > http://thread.gmane.org/gmane.linux.network/284348/focus=285405 could change
> > some semantics how packets are constructed before submitted to the driver.
> > 
> > We currently guarantee that we have the MAC/IP/UDP header in skb->data and the
> > payload is attached in the skb's frags. With the changes discussed in this
> > thread it is possible that we also append to skb->data some amount of data
> > which is not targeted for the header. From reading the driver sources it seems
> > the hardware interprets the skb->data to skb_headlen as the header, so we
> > could include some data in the fragments more than once.
> > 
> > Do you think this change is safe? Otherwise I would suggest that the UFO
> > capability is switched off until the driver signals the hardware the start and
> > end of the headers correctly?
> > 
> > I left the mail below intact which points to the specific place in s2io.c
> > where I think the problem is.
> > 
> > On Wed, Oct 02, 2013 at 08:14:27AM -0700, Eric Dumazet wrote:
> > > On Wed, 2013-10-02 at 15:03 +0200, Hannes Frederic Sowa wrote:
> > > > On Wed, Oct 02, 2013 at 02:12:07PM +0200, Hannes Frederic Sowa wrote:
> > > > > Hi Eric!
> > > > > 
> > > > > On Wed, Oct 02, 2013 at 03:41:28AM -0700, Eric Dumazet wrote:
> > > > > > On Wed, 2013-10-02 at 10:58 +0200, Jiri Pirko wrote:
> > > > > > > Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
> > > > > > > >-	if (((length > mtu) || (skb && skb_is_gso(skb))) &&
> > > > > > > >+	if (((length > mtu) || (skb && skb_has_frags(skb))) &&
> > > > > > 
> > > > > > > 
> > > > > > > This seems correct to me. sk_is_gso would work as well is you apply my
> > > > > > > patch "[patch net] ip6_output: do skb ufo init for peeked non ufo skb as
> > > > > > > well" which does the setting of gso_size.
> > > > > > 
> > > > > > Well, skb having frags or not should not be a concern :
> > > > > > Thats an allocation choice (lets say to avoid high order allocations). 
> > > > > > 
> > > > > > Setting gso_size is probably better.
> > > > > 
> > > > > e89e9cf539a28df7d0eb1d0a545368e9920b34ac ("[IPv4/IPv6]: UFO Scatter-gather
> > > > > approach") states:
> > > > > 
> > > > > "
> > > > > skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[]
> > > > > contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW
> > > > > indicating that hardware has to do checksum calculation. Hardware should
> > > > > compute the UDP checksum of complete datagram and also ip header checksum of
> > > > > each fragmented IP packet.
> > > > > "
> > > > > 
> > > > > This is the reason why I tried not to update the gso_size. If it is ok, I am
> > > > > fine with that.
> > > > 
> > > > Especially, drivers/net/ethernet/neterion/s2io.c states that the first dma
> > > > mapping (skb->data with skb_headlen, which is fine) is used as the inband
> > > > header:
> > > > 
> > > >         if (offload_type == SKB_GSO_UDP)
> > > >                 frg_cnt++; /* as Txd0 was used for inband header */
> > > > 
> > > > That is my only other hint that we maybe should not update gso_size and
> > > > gso_type. I guess software fallback does not have this problem, but I won't
> > > > have time to check until this evening.
> > > > 
> > > > I am really not sure if just setting gso_size does not break neterion UFO
> > > > offloading. :/
> > > 
> > > Well, just ask Jon Mason to double check ;)
> > > 
> > > I think the commit intent was to set gso_size :
> > > 
> > >    skb_shinfo(skb)->ufo_size will indicate the length of data part in each IP
> > >     fragment going out of the adapter after IP fragmentation by hardware.
> > > 
> > > The fact that it states "skb->data will contain MAC/IP/UDP header and
> > > skb_shinfo(skb)->frags[] contains the data payload." seems irrelevant.
> > > 
> > > If Neterion driver mandates that skb->head *only* contains the
> > > MAC/IP/UDP header, that should be handled in the driver itself.
> > 
> > Thanks Eric for clearing this up.
> > 
> > I really thought it would be the common pattern for UFO to have only headers
> > in skb->data, so I didn't bother to ask in the first place.
> > 
> > Thanks,
> > 
> >   Hannes
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 

^ permalink raw reply

* [PATCH v2] static_key: WARN on usage before jump_label_init was called
From: Hannes Frederic Sowa @ 2013-10-07 16:57 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: netdev, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Jason Baron, Peter Zijlstra, Eric Dumazet,
	andi @ firstfloor. org David S. Miller, x86
In-Reply-To: <20131007115152.0964afde@gandalf.local.home>

On Mon, Oct 07, 2013 at 11:51:52AM -0400, Steven Rostedt wrote:
> On Sun, 6 Oct 2013 20:29:19 +0200
> Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:
> 
> 
> > diff --git a/lib/jump_label_initialized.c b/lib/jump_label_initialized.c
> > new file mode 100644
> > index 0000000..a668a40
> > --- /dev/null
> > +++ b/lib/jump_label_initialized.c
> > @@ -0,0 +1,6 @@
> > +#include <linux/types.h>
> > +#include <linux/cache.h>
> > +
> > +bool static_key_initialized __read_mostly = false;
> > +EXPORT_SYMBOL_GPL(static_key_initialized);
> > +
> 
> So far, the only thing I don't like about this patch is the creation of
> this file for the sole purpose of adding this variable.
> 
> Perhaps it can just be added to init/main.c?

Yes, init/main.c seems to be a good fit. I also fixed some whitespace issues
and simplified the macro (as it is only one statement).

[PATCH v2] static_key: WARN on usage before jump_label_init was called

Based on a patch from Andi Kleen.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/linux/jump_label.h           | 10 ++++++++++
 include/linux/jump_label_ratelimit.h |  2 ++
 init/main.c                          |  7 +++++++
 kernel/jump_label.c                  |  5 +++++
 4 files changed, 24 insertions(+)

diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index a507907..e96be72 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -48,6 +48,13 @@
 
 #include <linux/types.h>
 #include <linux/compiler.h>
+#include <linux/bug.h>
+
+extern bool static_key_initialized;
+
+#define STATIC_KEY_CHECK_USE() WARN(!static_key_initialized,		      \
+				    "%s used before call to jump_label_init", \
+				    __func__)
 
 #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
 
@@ -128,6 +135,7 @@ struct static_key {
 
 static __always_inline void jump_label_init(void)
 {
+	static_key_initialized = true;
 }
 
 static __always_inline bool static_key_false(struct static_key *key)
@@ -146,11 +154,13 @@ static __always_inline bool static_key_true(struct static_key *key)
 
 static inline void static_key_slow_inc(struct static_key *key)
 {
+	STATIC_KEY_CHECK_USE();
 	atomic_inc(&key->enabled);
 }
 
 static inline void static_key_slow_dec(struct static_key *key)
 {
+	STATIC_KEY_CHECK_USE();
 	atomic_dec(&key->enabled);
 }
 
diff --git a/include/linux/jump_label_ratelimit.h b/include/linux/jump_label_ratelimit.h
index 1137883..089f70f 100644
--- a/include/linux/jump_label_ratelimit.h
+++ b/include/linux/jump_label_ratelimit.h
@@ -23,12 +23,14 @@ struct static_key_deferred {
 };
 static inline void static_key_slow_dec_deferred(struct static_key_deferred *key)
 {
+	STATIC_KEY_CHECK_USE();
 	static_key_slow_dec(&key->key);
 }
 static inline void
 jump_label_rate_limit(struct static_key_deferred *key,
 		unsigned long rl)
 {
+	STATIC_KEY_CHECK_USE();
 }
 #endif	/* HAVE_JUMP_LABEL */
 #endif	/* _LINUX_JUMP_LABEL_RATELIMIT_H */
diff --git a/init/main.c b/init/main.c
index af310af..6735b631 100644
--- a/init/main.c
+++ b/init/main.c
@@ -136,6 +136,13 @@ static char *execute_command;
 static char *ramdisk_execute_command;
 
 /*
+ * Used to generate warnings if static_key manipulation functions are used
+ * before boot
+ */
+bool static_key_initialized __read_mostly = false;
+EXPORT_SYMBOL_GPL(static_key_initialized);
+
+/*
  * If set, this is an indication to the drivers that reset the underlying
  * device before going ahead with the initialization otherwise driver might
  * rely on the BIOS and skip the reset operation.
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 297a924..9019f15 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -58,6 +58,7 @@ static void jump_label_update(struct static_key *key, int enable);
 
 void static_key_slow_inc(struct static_key *key)
 {
+	STATIC_KEY_CHECK_USE();
 	if (atomic_inc_not_zero(&key->enabled))
 		return;
 
@@ -103,12 +104,14 @@ static void jump_label_update_timeout(struct work_struct *work)
 
 void static_key_slow_dec(struct static_key *key)
 {
+	STATIC_KEY_CHECK_USE();
 	__static_key_slow_dec(key, 0, NULL);
 }
 EXPORT_SYMBOL_GPL(static_key_slow_dec);
 
 void static_key_slow_dec_deferred(struct static_key_deferred *key)
 {
+	STATIC_KEY_CHECK_USE();
 	__static_key_slow_dec(&key->key, key->timeout, &key->work);
 }
 EXPORT_SYMBOL_GPL(static_key_slow_dec_deferred);
@@ -116,6 +119,7 @@ EXPORT_SYMBOL_GPL(static_key_slow_dec_deferred);
 void jump_label_rate_limit(struct static_key_deferred *key,
 		unsigned long rl)
 {
+	STATIC_KEY_CHECK_USE();
 	key->timeout = rl;
 	INIT_DELAYED_WORK(&key->work, jump_label_update_timeout);
 }
@@ -212,6 +216,7 @@ void __init jump_label_init(void)
 		key->next = NULL;
 #endif
 	}
+	static_key_initialized = true;
 	jump_label_unlock();
 }
 
-- 
1.8.3.1

^ permalink raw reply related

* Re: Neterion and UFO handling [was: Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO]
From: Hannes Frederic Sowa @ 2013-10-07 16:53 UTC (permalink / raw)
  To: jon.mason
  Cc: jdmason, Jiri Pirko, netdev, yoshfuji, davem, kuznet, jmorris,
	kaber, herbert, eric.dumazet
In-Reply-To: <20131002162730.GA3991@order.stressinduktion.org>

Hi Jon!

Maybe I got the wrong email address for the neterion driver from the
maintainers filer? If you are (still) affiliated with the neterion driver,
maybe you could have a short look at the quoted mail below?

Thanks,

  Hannes

On Wed, Oct 02, 2013 at 06:27:30PM +0200, Hannes Frederic Sowa wrote:
> Hi!
> 
> I have a question regarding UFO and the neterion driver, which as the only one
> advertises hardware UFO support:
> 
> The patch discusses in this thread
> http://thread.gmane.org/gmane.linux.network/284348/focus=285405 could change
> some semantics how packets are constructed before submitted to the driver.
> 
> We currently guarantee that we have the MAC/IP/UDP header in skb->data and the
> payload is attached in the skb's frags. With the changes discussed in this
> thread it is possible that we also append to skb->data some amount of data
> which is not targeted for the header. From reading the driver sources it seems
> the hardware interprets the skb->data to skb_headlen as the header, so we
> could include some data in the fragments more than once.
> 
> Do you think this change is safe? Otherwise I would suggest that the UFO
> capability is switched off until the driver signals the hardware the start and
> end of the headers correctly?
> 
> I left the mail below intact which points to the specific place in s2io.c
> where I think the problem is.
> 
> On Wed, Oct 02, 2013 at 08:14:27AM -0700, Eric Dumazet wrote:
> > On Wed, 2013-10-02 at 15:03 +0200, Hannes Frederic Sowa wrote:
> > > On Wed, Oct 02, 2013 at 02:12:07PM +0200, Hannes Frederic Sowa wrote:
> > > > Hi Eric!
> > > > 
> > > > On Wed, Oct 02, 2013 at 03:41:28AM -0700, Eric Dumazet wrote:
> > > > > On Wed, 2013-10-02 at 10:58 +0200, Jiri Pirko wrote:
> > > > > > Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
> > > > > > >-	if (((length > mtu) || (skb && skb_is_gso(skb))) &&
> > > > > > >+	if (((length > mtu) || (skb && skb_has_frags(skb))) &&
> > > > > 
> > > > > > 
> > > > > > This seems correct to me. sk_is_gso would work as well is you apply my
> > > > > > patch "[patch net] ip6_output: do skb ufo init for peeked non ufo skb as
> > > > > > well" which does the setting of gso_size.
> > > > > 
> > > > > Well, skb having frags or not should not be a concern :
> > > > > Thats an allocation choice (lets say to avoid high order allocations). 
> > > > > 
> > > > > Setting gso_size is probably better.
> > > > 
> > > > e89e9cf539a28df7d0eb1d0a545368e9920b34ac ("[IPv4/IPv6]: UFO Scatter-gather
> > > > approach") states:
> > > > 
> > > > "
> > > > skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[]
> > > > contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW
> > > > indicating that hardware has to do checksum calculation. Hardware should
> > > > compute the UDP checksum of complete datagram and also ip header checksum of
> > > > each fragmented IP packet.
> > > > "
> > > > 
> > > > This is the reason why I tried not to update the gso_size. If it is ok, I am
> > > > fine with that.
> > > 
> > > Especially, drivers/net/ethernet/neterion/s2io.c states that the first dma
> > > mapping (skb->data with skb_headlen, which is fine) is used as the inband
> > > header:
> > > 
> > >         if (offload_type == SKB_GSO_UDP)
> > >                 frg_cnt++; /* as Txd0 was used for inband header */
> > > 
> > > That is my only other hint that we maybe should not update gso_size and
> > > gso_type. I guess software fallback does not have this problem, but I won't
> > > have time to check until this evening.
> > > 
> > > I am really not sure if just setting gso_size does not break neterion UFO
> > > offloading. :/
> > 
> > Well, just ask Jon Mason to double check ;)
> > 
> > I think the commit intent was to set gso_size :
> > 
> >    skb_shinfo(skb)->ufo_size will indicate the length of data part in each IP
> >     fragment going out of the adapter after IP fragmentation by hardware.
> > 
> > The fact that it states "skb->data will contain MAC/IP/UDP header and
> > skb_shinfo(skb)->frags[] contains the data payload." seems irrelevant.
> > 
> > If Neterion driver mandates that skb->head *only* contains the
> > MAC/IP/UDP header, that should be handled in the driver itself.
> 
> Thanks Eric for clearing this up.
> 
> I really thought it would be the common pattern for UFO to have only headers
> in skb->data, so I didn't bother to ask in the first place.
> 
> Thanks,
> 
>   Hannes
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: [PATCH] Fix the upper MTU limit in ipv6 GRE tunnel
From: David Miller @ 2013-10-07 16:53 UTC (permalink / raw)
  To: ou.ghorbel; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel
In-Reply-To: <1380880333-3546-1-git-send-email-ou.ghorbel@gmail.com>

From: Oussama Ghorbel <ou.ghorbel@gmail.com>
Date: Fri,  4 Oct 2013 10:52:13 +0100

> Unlike ipv4, the struct member hlen holds the length of the GRE and ipv6
> headers. This length is also counted in dev->hard_header_len.
> Perhaps, it's more clean to modify the hlen to count only the GRE header
> without ipv6 header as the variable name suggest, but the simple way to fix
> this without regression risk is simply modify the calculation of the limit
> in ip6gre_tunnel_change_mtu function.
> Verified in kernel version v3.11.
> 
> Signed-off-by: Oussama Ghorbel <ou.ghorbel@gmail.com>

Please resubmit this with a proper Subject line.

It should be "[PATCH] " then a legitimate subsystem prefix.
In this case "ipv6: " would be appropriate.  And then
the "ipv6" in your existing Subject text is redundant so
can be removed.

Thanks.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox