Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH RFC 0/6] copy aside frags with destructors (was [PATCH 7/9] net: add skb_orphan_frags to copy aside frags with destructors)
From: Michael S. Tsirkin @ 2012-05-09 13:52 UTC (permalink / raw)
  To: Ian Campbell; +Cc: David Miller, netdev@vger.kernel.org, eric.dumazet@gmail.com
In-Reply-To: <1336569529.25514.109.camel@zakaz.uk.xensource.com>

On Wed, May 09, 2012 at 02:18:49PM +0100, Ian Campbell wrote:
> On Mon, 2012-05-07 at 14:53 +0100, Michael S. Tsirkin wrote:
> > On Fri, May 04, 2012 at 11:51:31AM +0100, Ian Campbell wrote:
> > > On Fri, 2012-05-04 at 11:03 +0100, Michael S. Tsirkin wrote:
> > > > On Fri, May 04, 2012 at 02:54:33AM -0400, David Miller wrote:
> > > > > From: "Michael S. Tsirkin" <mst@redhat.com>
> > > > > Date: Fri, 4 May 2012 00:10:24 +0300
> > > > > 
> > > > > > Hmm we orphan skbs when we loop them back so how about reusing the
> > > > > > skb->destructor for this?
> > > > > 
> > > > > That's one possibility.
> > 
> > So originally I thought it would work: destructor would wake the
> > original owner which would do data copy and modify the fragments.  But
> > then I unfortunately realized that would be racy: the new owner could be
> 
> By "new owner" you mean the owner of some other skb which refernences
> the same shinfo (so either clone or the original if we currently hold a
> clone).
> 
> > using the old frags and there appears no way for us to
> > make sure it doesn't so that we can put the original page.
> 
> Right, the key issue is, I think, that cloning etc take an extra dataref
> on the shinfo but do not take references on the frags. This is obviously
> sane but does cause the issue above.
> 
> > And the same logic applies to modifying the frags at any
> > other time if the skb is cloned. So it seems we must copy if we
> > want to clone the skb.
> 
> Right. I had been hoping that the frag destructors could avoid this but
> it seems like this is not going to be the case.
> 
> Just as a thought experiment would taking a frag ref on clone help us?
> (lets assume for now that, iff there are destructors, this is cheaper
> than the copy). I think this doesn't work -- since although the ref will
> mean that the page doesn't go away under anyone elses feet after we've
> orphaned it we will have lost the pointer to the original page by the
> time that other ref is dropped, so freeing it will be tricky.
> 
> So following that train of thought a step further -- would creating a
> new shinfo entirely on frag orphan buy us anything (and at what cost)?
> Obviously it an extra allocation -- but we are already allocating N
> pages of new frag. The others skb's referencing the shared shinfo would
> still continue to see the old shinfo, pages and refs until they
> themselves needed to orphan the frags (if they do, they might avoid it).

So it would address the second problem (cloned skb) but
not the first one (original owner racing with the first one).

> I think that could work, but I'm not sure it is worth the complexity.
> Basically all it means that if you are bridging + flooding then you
> would potentially save some number of a copies depending on the types of
> devices on each port.
> 
> [...]
> 
> > Second option is what macvtap zero copy uses and it already
> > does copy on clone too. So I hacked that to make it support tcp/udp
> > used by sunrpc.
> 
> This does seem like it is the preferable solution.
> 
> I haven't read the patches yet, so maybe you already support this, but
> it would be good to allow the creator of an skb to declare that it
> doesn't want this behaviour, e.g. because it has some other mechanism
> for dealing with this copy out for the case where an skb gets held
> somewhere.
> 
> Ian.

Sure, if we find a way to do that we just don't set ZEROCOPY skb flag.

So I'll give people a bit more time to review, hope
you find the time too.

> > > > > 
> > > > > But I fear we're about to toss Ian into yet another rabbit hole. :-)
> > > > > 
> > > > > Let's try to converge on something quickly as I think integration of
> > > > > his work has been delayed enough as-is.
> > 
> > ...
> > 
> > > > It's weekend here, I'll work on a patch like this
> > > > Sunday.
> > > 
> > > Thanks, I was starting to feel my nose twitching and my ears beginning
> > > to elongate ;-)
> > > 
> > > Ian.
> > 
> > Here's a first stub at a fix. Basically to be able to modify frags on
> > the fly we must make sure the skb isn't cloned, so the moment someone
> > clones the skb we need to trigger the frag copy logic.  And this is
> > exactly what happens with SKBTX_DEV_ZEROCOPY so it seems to make sense
> > to reuse that logic.
> > 
> > The below patchset replaces patch 7
> > ([PATCH 7/9] net: add skb_orphan_frags to copy aside frags with destructors)
> > in Ian's patchset and needs to be applied there.
> > 
> > 
> > Compiled only but I'd like to hear what people think all the same
> > because it does add a couple of branches on fast path.  On the other
> > hand this makes it generic so the same logic will be reusable for packet
> > sockets (which IIRC are currently buggy in the same way as sunrpc) and
> > for adding zero copy support to tun.
> > 
> > Please comment,
> > Thanks!
> > 
> > -- 
> > MST
> > 
> > 
> > Michael S. Tsirkin (6):
> >   skbuff: support per-page destructors in copy_ubufs
> >   skbuff: add an api to orphan frags
> >   skbuff: convert to skb_orphan_frags
> >   tun: orphan frags on xmit
> >   net: orphan frags on receive
> >   skbuff: set zerocopy flag on frag destructor
> > 
> >  drivers/net/tun.c      |    2 ++
> >  include/linux/skbuff.h |   41 +++++++++++++++++++++++++++++++++++++++++
> >  net/core/dev.c         |    2 ++
> >  net/core/skbuff.c      |   43 +++++++++++++++++++++++++------------------
> >  4 files changed, 70 insertions(+), 18 deletions(-)
> > 
> 

^ permalink raw reply

* Re: [PATCH RFC 5/6] net: orphan frags on receive
From: Ian Campbell @ 2012-05-09 13:54 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David Miller, netdev@vger.kernel.org, eric.dumazet@gmail.com
In-Reply-To: <a3353f6235d877c6b2d300eb5a914dbc9b78824d.1336397823.git.mst@redhat.com>

On Mon, 2012-05-07 at 14:54 +0100, Michael S. Tsirkin wrote:
> zero copy packets are normally sent to the outside
> network, but bridging, tun etc might loop them
> back to host networking stack. If this happens
> destructors will never be called, so orphan
> the frags immediately on receive.

I think this deceptively simply patch is actually the meat of the
series.

It's been a long time since I dug into the bridging code. Am I right
that this function is only reached when an SKB is going to be delivered
locally and not when forwarding to another port on the bridge?

Ian.

> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  net/core/dev.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index a2be59f..c0cdc00 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1630,6 +1630,8 @@ static inline int deliver_skb(struct sk_buff *skb,
>  			      struct packet_type *pt_prev,
>  			      struct net_device *orig_dev)
>  {
> +	if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
> +		return -ENOMEM;
>  	atomic_inc(&skb->users);
>  	return pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
>  }

^ permalink raw reply

* Re: [PATCH RFC 0/6] copy aside frags with destructors (was [PATCH 7/9] net: add skb_orphan_frags to copy aside frags with destructors)
From: Ian Campbell @ 2012-05-09 14:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David Miller, netdev@vger.kernel.org, eric.dumazet@gmail.com
In-Reply-To: <20120509135240.GA19246@redhat.com>

On Wed, 2012-05-09 at 14:52 +0100, Michael S. Tsirkin wrote:
> On Wed, May 09, 2012 at 02:18:49PM +0100, Ian Campbell wrote:
> > On Mon, 2012-05-07 at 14:53 +0100, Michael S. Tsirkin wrote:
> > > On Fri, May 04, 2012 at 11:51:31AM +0100, Ian Campbell wrote:
> > > > On Fri, 2012-05-04 at 11:03 +0100, Michael S. Tsirkin wrote:
> > > > > On Fri, May 04, 2012 at 02:54:33AM -0400, David Miller wrote:
> > > > > > From: "Michael S. Tsirkin" <mst@redhat.com>
> > > > > > Date: Fri, 4 May 2012 00:10:24 +0300
> > > > > > 
> > > > > > > Hmm we orphan skbs when we loop them back so how about reusing the
> > > > > > > skb->destructor for this?
> > > > > > 
> > > > > > That's one possibility.
> > > 
> > > So originally I thought it would work: destructor would wake the
> > > original owner which would do data copy and modify the fragments.  But
> > > then I unfortunately realized that would be racy: the new owner could be
> > 
> > By "new owner" you mean the owner of some other skb which refernences
> > the same shinfo (so either clone or the original if we currently hold a
> > clone).
> > 
> > > using the old frags and there appears no way for us to
> > > make sure it doesn't so that we can put the original page.
> > 
> > Right, the key issue is, I think, that cloning etc take an extra dataref
> > on the shinfo but do not take references on the frags. This is obviously
> > sane but does cause the issue above.
> > 
> > > And the same logic applies to modifying the frags at any
> > > other time if the skb is cloned. So it seems we must copy if we
> > > want to clone the skb.
> > 
> > Right. I had been hoping that the frag destructors could avoid this but
> > it seems like this is not going to be the case.
> > 
> > Just as a thought experiment would taking a frag ref on clone help us?
> > (lets assume for now that, iff there are destructors, this is cheaper
> > than the copy). I think this doesn't work -- since although the ref will
> > mean that the page doesn't go away under anyone elses feet after we've
> > orphaned it we will have lost the pointer to the original page by the
> > time that other ref is dropped, so freeing it will be tricky.
> > 
> > So following that train of thought a step further -- would creating a
> > new shinfo entirely on frag orphan buy us anything (and at what cost)?
> > Obviously it an extra allocation -- but we are already allocating N
> > pages of new frag. The others skb's referencing the shared shinfo would
> > still continue to see the old shinfo, pages and refs until they
> > themselves needed to orphan the frags (if they do, they might avoid it).
> 
> So it would address the second problem (cloned skb) but
> not the first one (original owner racing with the first one).

Right.

> > I think that could work, but I'm not sure it is worth the complexity.
> > Basically all it means that if you are bridging + flooding then you
> > would potentially save some number of a copies depending on the types of
> > devices on each port.
> > 
> > [...]
> > 
> > > Second option is what macvtap zero copy uses and it already
> > > does copy on clone too. So I hacked that to make it support tcp/udp
> > > used by sunrpc.
> > 
> > This does seem like it is the preferable solution.
> > 
> > I haven't read the patches yet, so maybe you already support this, but
> > it would be good to allow the creator of an skb to declare that it
> > doesn't want this behaviour, e.g. because it has some other mechanism
> > for dealing with this copy out for the case where an skb gets held
> > somewhere.
> > 
> > Ian.
> 
> Sure, if we find a way to do that we just don't set ZEROCOPY skb flag.

After reading "net: orphan frags on receive" I'm not sure how necessary
it will be, since the main case I was worried about was lots of
copying/orphaning on bridge forwarding, but I convinced myself (I think)
that the series only effects delivery and not forwarding.

Still, no harm in offering the option.

> So I'll give people a bit more time to review, hope
> you find the time too.

I took a look at a fairly high level and it all looked sensible, I've
not looked closely at the details, not run it yet, I hope I can do that
shortly.

BTW, I never said thanks for looking into this, so thanks ;-)

^ permalink raw reply

* Re: [net-next 5/9] e1000e: Disable ASPM L1 on 82574
From: Nix @ 2012-05-09 14:02 UTC (permalink / raw)
  To: Wyborny, Carolyn
  Cc: Matthew Garrett, Kirsher, Jeffrey T, davem@davemloft.net,
	Chris Boot, netdev@vger.kernel.org, gospo@redhat.com,
	sassmann@redhat.com
In-Reply-To: <87fwbea8pi.fsf@spindle.srvr.nix>

On 5 May 2012, nix@esperi.org.uk outgrape:

> The question here is how to fix it. It appears that the motherboard or
> BIOS on this machine does not grant _OSC control even (especially?) if
> you have turned on PCIe ASPM in the BIOS. But perhaps even if _OSC is
> not granted you should permit PCIe to be *disabled* by drivers, just not
> enabled? (The BIOS appears to be buggy in this area: if you turn off
> ASPM, save, and go back into setup, ASPM has turned itself back on
> again!)

This turned out to be me not knowing how to drive the BIOS's deeply
unintuitive configuration program. If I turn PCIe ASPM off in the BIOS,
the kernel does exactly the same as it does in my previous message (i.e.
decides that ASPM is disabled due to the failure of an _OSC request,
then refuses to change the ASPM link state of the e1000e), but since the
BIOS has already disabled ASPM, the card is not in crash-happy mode and
I don't need to force anything off by hand.

But if I turn ASPM on, as reported in my previous message the kernel
promptly bans itself from changing any PCIe ASPM link states whatsoever,
and the e1000e locks up about an hour later.

I presume that

May  5 17:06:53 spindle info: [    0.629699]  pci0000:00: Requesting ACPI _OSC control (0x1d)
May  5 17:06:53 spindle info: [    0.629941]  pci0000:00: ACPI _OSC request failed (AE_NOT_FOUND), returned control mask: 0x1d
May  5 17:06:53 spindle info: [    0.630373] ACPI _OSC control for PCIe not granted, disabling ASPM

is reporting some sort of BIOS bug, but it is at best confusing to have
the boot messages reporting that ASPM is disabled: better perhaps to
describe this as 'leaving ASPM on all devices how the BIOS set it' and
have the e1000e driver emit a giant flaming warning if it spots this
happening on an 82574L with ASPM turned on. (Or, alternatively, permit
ASPM to be turned off when the system is in this state, but not on,
whereupon the existing code in the e1000e driver will do the right
thing. But I don't know if that will break any laptops. This machine is
very much not a laptop.)

> I'm not sure what the right thing to do is here: I don't know enough
> about this area. But it does seem very strange that the only way I have
> to turn off PCIe ASPM reliably on this device is to tell the kernel to
> forcibly turn it *on*!

This is still strange, though it seems that turning ASPM completely off
in the BIOS will also serve.

^ permalink raw reply

* RE: [PATCH] pch_gbe: Adding read memory barriers
From: David Laight @ 2012-05-09 14:44 UTC (permalink / raw)
  To: Erwan Velu, Ben Hutchings; +Cc: netdev, linux-kernel, tshimizu818
In-Reply-To: <CAL2JzuxirjAnmbcV+R=MJFAi1+ayrRnM6i0Dz5poqMuG7Cxwxg@mail.gmail.com>

> So I've been reading how does others drivers are working regarding
this type of message.
> While reading many drivers, I found that many does rmb() calls at this
points.
> As I'm not a linux kernel expert, I did mimic them to the result
> and it was good since the errors disappeared.

> So do you mean the patch is wrong or shall I rework only the comments
?

The problem is that barriers are not understood at all well
by most driver writers - so a lot of code is sub-optimal.

It isn't helped by the readability of the cpu documentation,
never mind some old docs that described something that an
engineer thought they might want to allow - rather than
describing what any cpus actually did/do.

For x86 cpus (amd and intel at least) reads and writes to
uncached memory and io are guaranteed to happen in instruction
order. For other cpus (sparc and ppc) such reads can
overtake writes - requiring something to force the write to
complete (eg a synchronising instruction, or maybe a readeback).

One problem is that the barrier instructions are generally
slow - so you don't want to use them where unnecessary,
however whether you need them is somewhat hardware specific
and the typical OS lists of barriers doesn't cover all the
cases - even for things that are actually compile time knowable.
Eg: what you need between a write to 'dma coherent' memory
and a write to a control register depends on whether 'dma
coherency' is implemented by disabling the cache.

For SMP there may be additional synchronisation, especially
for cached data, but for most drivers that is handled by
mutex/lock operations.

The other, and separate, issue is making sure that the compiler
itself doesn't reorder of optimise away memory cycles.
By far the best way is to mark the data (not the pointer to the
data) 'volatile', this forces the compiler to generate object
code for every variable reference in the source, and in the right
order.
Sometimes that is inappropiate, gcc's asm volatile (:::"memory")
tells the compiler that this asm statement might modify all
of memory (even though there are actually no instructions!).
This forces it to avoid having anything in a register that is
a copy of something in memory - thus enforcing the order of
memory accesses.
It can also be used to reduce register pressure within the
compiler.

In your case I suspect that the compiler has re-ordered the
reads - adding almost anything between them will change the
order.

	David

^ permalink raw reply

* [PATCH] netlink: connector: implement cn_netlink_reply
From: Alban Crequy @ 2012-05-09 14:37 UTC (permalink / raw)
  To: Evgeniy Polyakov
  Cc: netdev, Vincent Sanders, Javier Martinez Canillas, Alban Crequy,
	Rodrigo Moya, Evgeniy Polyakov

In a connector callback, it was not possible to reply to a message only to a
sender. This patch implements cn_netlink_reply(). It uses the connector socket
to send an unicast netlink message back to the sender.

The following pseudo-code can be used from a connector callback:

        struct cn_msg *cn_reply;
        cn_reply = kzalloc(sizeof(struct cn_msg)
                + sizeof(struct ..._nl_cfg_reply), GFP_KERNEL);

        cn_reply->id = msg->id;
        cn_reply->seq = msg->seq;
        cn_reply->ack = msg->ack  + 1;
        cn_reply->len = sizeof(struct ..._nl_cfg_reply);
        cn_reply->flags = 0;

        rr = cn_netlink_reply(cn_reply, nsp->pid, GFP_KERNEL);

Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
Reviewed-by: Vincent Sanders <vincent.sanders@collabora.co.uk>
Acked-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
CC: Rodrigo Moya <rodrigo.moya@collabora.co.uk>
CC: Evgeniy Polyakov <zbr@ioremap.net>
---
 drivers/connector/connector.c |   32 ++++++++++++++++++++++++++++++++
 include/linux/connector.h     |    1 +
 2 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index dde6a0f..1cb488d 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -118,6 +118,38 @@ nlmsg_failure:
 EXPORT_SYMBOL_GPL(cn_netlink_send);
 
 /*
+ * Send an unicast reply from a connector callback
+ *
+ */
+int cn_netlink_reply(struct cn_msg *msg, u32 pid, gfp_t gfp_mask)
+{
+	unsigned int size;
+	struct sk_buff *skb;
+	struct nlmsghdr *nlh;
+	struct cn_msg *data;
+	struct cn_dev *dev = &cdev;
+
+	size = NLMSG_SPACE(sizeof(*msg) + msg->len);
+
+	skb = alloc_skb(size, gfp_mask);
+	if (!skb)
+		return -ENOMEM;
+
+	nlh = NLMSG_PUT(skb, 0, msg->seq, NLMSG_DONE, size - sizeof(*nlh));
+
+	data = NLMSG_DATA(nlh);
+
+	memcpy(data, msg, sizeof(*data) + msg->len);
+
+	return netlink_unicast(dev->nls, skb, pid, 1);
+
+nlmsg_failure:
+	kfree_skb(skb);
+	return -EINVAL;
+}
+EXPORT_SYMBOL_GPL(cn_netlink_reply);
+
+/*
  * Callback helper - queues work and setup destructor for given data.
  */
 static int cn_call_callback(struct sk_buff *skb)
diff --git a/include/linux/connector.h b/include/linux/connector.h
index 7638407..c27be60 100644
--- a/include/linux/connector.h
+++ b/include/linux/connector.h
@@ -125,6 +125,7 @@ int cn_add_callback(struct cb_id *id, const char *name,
 		    void (*callback)(struct cn_msg *, struct netlink_skb_parms *));
 void cn_del_callback(struct cb_id *);
 int cn_netlink_send(struct cn_msg *, u32, gfp_t);
+int cn_netlink_reply(struct cn_msg *, u32, gfp_t);
 
 int cn_queue_add_callback(struct cn_queue_dev *dev, const char *name,
 			  struct cb_id *id,
-- 
1.7.2.5

^ permalink raw reply related

* Re: [PATCH RFC 0/6] copy aside frags with destructors (was [PATCH 7/9] net: add skb_orphan_frags to copy aside frags with destructors)
From: Michael S. Tsirkin @ 2012-05-09 14:56 UTC (permalink / raw)
  To: Ian Campbell; +Cc: David Miller, netdev@vger.kernel.org, eric.dumazet@gmail.com
In-Reply-To: <20120509135240.GA19246@redhat.com>

On Wed, May 09, 2012 at 04:52:40PM +0300, Michael S. Tsirkin wrote:
> On Wed, May 09, 2012 at 02:18:49PM +0100, Ian Campbell wrote:
> > On Mon, 2012-05-07 at 14:53 +0100, Michael S. Tsirkin wrote:
> > > On Fri, May 04, 2012 at 11:51:31AM +0100, Ian Campbell wrote:
> > > > On Fri, 2012-05-04 at 11:03 +0100, Michael S. Tsirkin wrote:
> > > > > On Fri, May 04, 2012 at 02:54:33AM -0400, David Miller wrote:
> > > > > > From: "Michael S. Tsirkin" <mst@redhat.com>
> > > > > > Date: Fri, 4 May 2012 00:10:24 +0300
> > > > > > 
> > > > > > > Hmm we orphan skbs when we loop them back so how about reusing the
> > > > > > > skb->destructor for this?
> > > > > > 
> > > > > > That's one possibility.
> > > 
> > > So originally I thought it would work: destructor would wake the
> > > original owner which would do data copy and modify the fragments.  But
> > > then I unfortunately realized that would be racy: the new owner could be
> > 
> > By "new owner" you mean the owner of some other skb which refernences
> > the same shinfo (so either clone or the original if we currently hold a
> > clone).

Sorry didn't clarify this well enough. We often do something like
the following

	orphan
	set owner

new owner is what is set afterwards.

> > > using the old frags and there appears no way for us to
> > > make sure it doesn't so that we can put the original page.
> > 
> > Right, the key issue is, I think, that cloning etc take an extra dataref
> > on the shinfo but do not take references on the frags. This is obviously
> > sane but does cause the issue above.
> > 
> > > And the same logic applies to modifying the frags at any
> > > other time if the skb is cloned. So it seems we must copy if we
> > > want to clone the skb.
> > 
> > Right. I had been hoping that the frag destructors could avoid this but
> > it seems like this is not going to be the case.
> > 
> > Just as a thought experiment would taking a frag ref on clone help us?
> > (lets assume for now that, iff there are destructors, this is cheaper
> > than the copy). I think this doesn't work -- since although the ref will
> > mean that the page doesn't go away under anyone elses feet after we've
> > orphaned it we will have lost the pointer to the original page by the
> > time that other ref is dropped, so freeing it will be tricky.
> > 
> > So following that train of thought a step further -- would creating a
> > new shinfo entirely on frag orphan buy us anything (and at what cost)?
> > Obviously it an extra allocation -- but we are already allocating N
> > pages of new frag. The others skb's referencing the shared shinfo would
> > still continue to see the old shinfo, pages and refs until they
> > themselves needed to orphan the frags (if they do, they might avoid it).
> 
> So it would address the second problem (cloned skb) but
> not the first one (original owner racing with the first one).

Oh and there's another issue: some code assumes that clones
are always identical. For example tcpdump
might show wrong packet content if the page is modified
while packet gets through networking core, so a malicious
application might confuse anything that relies on
this data to do e.g. security filtering.

I don't know whether anyone cares about tcpdump output
being always correct but it seems nicer to side-step the issue.


> > I think that could work, but I'm not sure it is worth the complexity.
> > Basically all it means that if you are bridging + flooding then you
> > would potentially save some number of a copies depending on the types of
> > devices on each port.
> > 
> > [...]
> > 
> > > Second option is what macvtap zero copy uses and it already
> > > does copy on clone too. So I hacked that to make it support tcp/udp
> > > used by sunrpc.
> > 
> > This does seem like it is the preferable solution.
> > 
> > I haven't read the patches yet, so maybe you already support this, but
> > it would be good to allow the creator of an skb to declare that it
> > doesn't want this behaviour, e.g. because it has some other mechanism
> > for dealing with this copy out for the case where an skb gets held
> > somewhere.
> > 
> > Ian.
> 
> Sure, if we find a way to do that we just don't set ZEROCOPY skb flag.
> 
> So I'll give people a bit more time to review, hope
> you find the time too.
> 
> > > > > > 
> > > > > > But I fear we're about to toss Ian into yet another rabbit hole. :-)
> > > > > > 
> > > > > > Let's try to converge on something quickly as I think integration of
> > > > > > his work has been delayed enough as-is.
> > > 
> > > ...
> > > 
> > > > > It's weekend here, I'll work on a patch like this
> > > > > Sunday.
> > > > 
> > > > Thanks, I was starting to feel my nose twitching and my ears beginning
> > > > to elongate ;-)
> > > > 
> > > > Ian.
> > > 
> > > Here's a first stub at a fix. Basically to be able to modify frags on
> > > the fly we must make sure the skb isn't cloned, so the moment someone
> > > clones the skb we need to trigger the frag copy logic.  And this is
> > > exactly what happens with SKBTX_DEV_ZEROCOPY so it seems to make sense
> > > to reuse that logic.
> > > 
> > > The below patchset replaces patch 7
> > > ([PATCH 7/9] net: add skb_orphan_frags to copy aside frags with destructors)
> > > in Ian's patchset and needs to be applied there.
> > > 
> > > 
> > > Compiled only but I'd like to hear what people think all the same
> > > because it does add a couple of branches on fast path.  On the other
> > > hand this makes it generic so the same logic will be reusable for packet
> > > sockets (which IIRC are currently buggy in the same way as sunrpc) and
> > > for adding zero copy support to tun.
> > > 
> > > Please comment,
> > > Thanks!
> > > 
> > > -- 
> > > MST
> > > 
> > > 
> > > Michael S. Tsirkin (6):
> > >   skbuff: support per-page destructors in copy_ubufs
> > >   skbuff: add an api to orphan frags
> > >   skbuff: convert to skb_orphan_frags
> > >   tun: orphan frags on xmit
> > >   net: orphan frags on receive
> > >   skbuff: set zerocopy flag on frag destructor
> > > 
> > >  drivers/net/tun.c      |    2 ++
> > >  include/linux/skbuff.h |   41 +++++++++++++++++++++++++++++++++++++++++
> > >  net/core/dev.c         |    2 ++
> > >  net/core/skbuff.c      |   43 +++++++++++++++++++++++++------------------
> > >  4 files changed, 70 insertions(+), 18 deletions(-)
> > > 
> > 

^ permalink raw reply

* Re: [PATCH 04/13] bridge: netfilter: Convert compare_ether_addr to ether_addr_equal
From: Stephen Hemminger @ 2012-05-09 15:07 UTC (permalink / raw)
  To: Joe Perches
  Cc: David S. Miller, Bart De Schuymer, Pablo Neira Ayuso,
	Patrick McHardy, netfilter-devel, netfilter, coreteam, bridge,
	netdev, linux-kernel
In-Reply-To: <5a91d328db3393320170723cdfc9395ad59182c6.1336538938.git.joe@perches.com>

On Tue,  8 May 2012 21:56:48 -0700
Joe Perches <joe@perches.com> wrote:

> Use the new bool function ether_addr_equal to add
> some clarity and reduce the likelihood for misuse
> of compare_ether_addr for sorting.
> 
> Done via cocci script:
> 
> $ cat compare_ether_addr.cocci
> @@
> expression a,b;
> @@
> -	!compare_ether_addr(a, b)
> +	ether_addr_equal(a, b)
> 
> @@
> expression a,b;
> @@
> -	compare_ether_addr(a, b)
> +	!ether_addr_equal(a, b)
> 
> @@
> expression a,b;
> @@
> -	!ether_addr_equal(a, b) == 0
> +	ether_addr_equal(a, b)
> 
> @@
> expression a,b;
> @@
> -	!ether_addr_equal(a, b) != 0
> +	!ether_addr_equal(a, b)
> 
> @@
> expression a,b;
> @@
> -	ether_addr_equal(a, b) == 0
> +	!ether_addr_equal(a, b)
> 
> @@
> expression a,b;
> @@
> -	ether_addr_equal(a, b) != 0
> +	ether_addr_equal(a, b)
> 
> @@
> expression a,b;
> @@
> -	!!ether_addr_equal(a, b)
> +	ether_addr_equal(a, b)
> 
> Signed-off-by: Joe Perches <joe@perches.com>
> ---
>  net/bridge/netfilter/ebt_stp.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/bridge/netfilter/ebt_stp.c b/net/bridge/netfilter/ebt_stp.c
> index 5b33a2e..071d872 100644
> --- a/net/bridge/netfilter/ebt_stp.c
> +++ b/net/bridge/netfilter/ebt_stp.c
> @@ -164,8 +164,8 @@ static int ebt_stp_mt_check(const struct xt_mtchk_param *par)
>  	    !(info->bitmask & EBT_STP_MASK))
>  		return -EINVAL;
>  	/* Make sure the match only receives stp frames */
> -	if (compare_ether_addr(e->destmac, bridge_ula) ||
> -	    compare_ether_addr(e->destmsk, msk) || !(e->bitmask & EBT_DESTMAC))
> +	if (!ether_addr_equal(e->destmac, bridge_ula) ||
> +	    !ether_addr_equal(e->destmsk, msk) || !(e->bitmask & EBT_DESTMAC))
>  		return -EINVAL;
>  
>  	return 0;

All look good.
Acked-by: Stephen Hemminger <shemminger@vyatta.com>


^ permalink raw reply

* Re: [PATCH RFC 5/6] net: orphan frags on receive
From: Michael S. Tsirkin @ 2012-05-09 15:12 UTC (permalink / raw)
  To: Ian Campbell; +Cc: David Miller, netdev@vger.kernel.org, eric.dumazet@gmail.com
In-Reply-To: <1336571692.25514.122.camel@zakaz.uk.xensource.com>

On Wed, May 09, 2012 at 02:54:52PM +0100, Ian Campbell wrote:
> On Mon, 2012-05-07 at 14:54 +0100, Michael S. Tsirkin wrote:
> > zero copy packets are normally sent to the outside
> > network, but bridging, tun etc might loop them
> > back to host networking stack. If this happens
> > destructors will never be called, so orphan
> > the frags immediately on receive.
> 
> I think this deceptively simply patch is actually the meat of the
> series.
> It's been a long time since I dug into the bridging code. Am I right
> that this function is only reached when an SKB is going to be delivered
> locally and not when forwarding to another port on the bridge?
> 
> Ian.

I think so - bridge uses netdev_rx_handler_unregister
so it does not go through packet handlers.

> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  net/core/dev.c |    2 ++
> >  1 files changed, 2 insertions(+), 0 deletions(-)
> > 
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index a2be59f..c0cdc00 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -1630,6 +1630,8 @@ static inline int deliver_skb(struct sk_buff *skb,
> >  			      struct packet_type *pt_prev,
> >  			      struct net_device *orig_dev)
> >  {
> > +	if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
> > +		return -ENOMEM;
> >  	atomic_inc(&skb->users);
> >  	return pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
> >  }
> 

^ permalink raw reply

* Re: [PATCH] net: skb_set_dev do not unconditionally drop ref to dst
From: Stephen Hemminger @ 2012-05-09 15:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Frank Blaschka, davem, netdev, linux-s390, Arnd Bergmann
In-Reply-To: <1335765093.2296.4.camel@edumazet-glaptop>

On Mon, 30 Apr 2012 07:51:33 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Mon, 2012-04-30 at 07:38 +0200, Frank Blaschka wrote:
> > From: Frank Blaschka <frank.blaschka@de.ibm.com>
> > 
> > commit 8a83a00b0735190384a348156837918271034144 unconditionally
> > drops dst reference when skb->dev is set. This causes a regression
> > with VLAN and the qeth_l3 network driver. qeth_l3 can not get gw
> > information from the skb coming from the vlan driver. It is only
> > valid to drop the dst in case of different name spaces.
> > 
> > Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
> > ---
> >  net/core/dev.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -1881,8 +1881,8 @@ EXPORT_SYMBOL(netif_device_attach);
> >  #ifdef CONFIG_NET_NS
> >  void skb_set_dev(struct sk_buff *skb, struct net_device *dev)
> >  {
> > -	skb_dst_drop(skb);
> >  	if (skb->dev && !net_eq(dev_net(skb->dev), dev_net(dev))) {
> > +		skb_dst_drop(skb);
> >  		secpath_reset(skb);
> >  		nf_reset(skb);
> >  		skb_init_secmark(skb);
> > 
> 
> You forgot CC Arnd Bergmann <arnd@arndb.de> ?
> 
> But we do want to do the skb_dst_drop() in dev_forward_skb()
> 
> Your patch breaks dev_forward_skb() then.
> 
> But apparently this forward path was alredy broken in Arnd patch...

Is this related to why, PMTU discover is broken now over GRE.
The simple case of doing a TCP transfer from local host over
GRE tunnel hangs.

What happens is that skb_dst(skb) is null in ip_tunnel_xmit.
Which leads to the MTU not being updated, and the ICMP_FRAG_NEEDED
is never sent.

^ permalink raw reply

* Re: [PATCH RFC 0/6] copy aside frags with destructors (was [PATCH 7/9] net: add skb_orphan_frags to copy aside frags with destructors)
From: Michael S. Tsirkin @ 2012-05-09 15:27 UTC (permalink / raw)
  To: Ian Campbell; +Cc: David Miller, netdev@vger.kernel.org, eric.dumazet@gmail.com
In-Reply-To: <1336572107.25514.127.camel@zakaz.uk.xensource.com>

On Wed, May 09, 2012 at 03:01:47PM +0100, Ian Campbell wrote:
> I took a look at a fairly high level and it all looked sensible, I've
> not looked closely at the details, not run it yet, I hope I can do that
> shortly.

In particular you can trace skb_copy_ubufs to verify that
it's called in the scenario where we want a copybreak
and not called where unexpected e.g. in bridge forwarding.

^ permalink raw reply

* Re: [PATCH] pch_gbe: Adding read memory barriers
From: Ben Hutchings @ 2012-05-09 15:39 UTC (permalink / raw)
  To: Erwan Velu; +Cc: David Laight, netdev, linux-kernel, tshimizu818
In-Reply-To: <CAL2JzuxawnesC+=Fqyu71Hkwi5gnRughe_NYM4k1c=zqDoNGRg@mail.gmail.com>

On Wed, 2012-05-09 at 17:20 +0200, Erwan Velu wrote:
> Thanks for this very clear description.

It might be clear, but it's not very relevant: you need to understand
the Linux kernel barrier APIs (which cover all architectures).  See
Documentation/memory-barriers.txt.

> As I'm pretty new to this kind of integration in the kernel, do you
> request me to test this volatile option ?
> Does the patch tends to be rejected for this issue ?
> What should be the next step for my patch ?
[...]

Explain what kind of reordering you are trying to avoid.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH 3.3.0] phy:icplus:fix Auto Power Saving in ip101a_config_init.
From: David Miller @ 2012-05-09 15:57 UTC (permalink / raw)
  To: jrnieder; +Cc: srinivas.kandagatla, netdev, peppe.cavallaro
In-Reply-To: <20120509072537.GA437@burratino>

From: Jonathan Nieder <jrnieder@gmail.com>
Date: Wed, 9 May 2012 02:25:37 -0500

> Hi Dave,
> 
> David Miller wrote:
>> From: Srinivas KANDAGATLA <srinivas.kandagatla@st.com>
> 
>>> This patch fixes Auto Power Saving configuration in ip101a_config_init
>>> which was broken as there is no phy register write followed after
>>> setting IP101A_APS_ON flag.
>>>
>>> This patch also fixes the return value of ip101a_config_init.
>>>
>>> Without this patch ip101a_config_init returns 2 which is not an error
>>> accroding to IS_ERR and the mac driver will continue accessing 2 as
>>> valid pointer to phy_dev resulting in memory fault.
>>>
>>> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
>>
>> Applied and queued up for -stable, thanks.
> 
> I'm guessing 3.2.y needs this, too, since it also includes
> v3.2-rc1~129^2~247 ("net/phy: add IC+ IP101A and support APS").
> 
> Here's a quick backport made by adjusting the context line to
> use IP101A_APS_ON instead of IP101A_G_APS_ON.
> 
> Please include it in some later stable update if appropriate.

Please submit this directly to the stable list.

^ permalink raw reply

* Re: [net-next] e1000e: Fix merge conflict (net->net-next)
From: David Miller @ 2012-05-09 16:07 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, sassmann
In-Reply-To: <1336555426-13934-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed,  9 May 2012 02:23:46 -0700

> During merge of net to net-next the changes in patch:
> 
> e1000e: Fix default interrupt throttle rate not set in NIC HW
> 
> got munged in param.c of the e1000e driver.  This rectifies the
> merge issues.
> 
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied, thanks Jeff.

^ permalink raw reply

* [PATCHv2 net-next 2/4] netxen_nic: Allow only useful and recommended firmware dump capture mask values
From: Rajesh Borundia @ 2012-05-09 15:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, Manish chopra
In-Reply-To: <1336578930-25141-1-git-send-email-rajesh.borundia@qlogic.com>

From: Manish chopra <manish.chopra@qlogic.com>

o 0x3, 0x7, 0xF, 0x1F, 0x3F, 0x7F and 0xFF are the allowed capture masks.

Signed-off-by: Manish chopra <manish.chopra@qlogic.com>
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
---
 drivers/net/ethernet/qlogic/netxen/netxen_nic.h    |    3 +++
 .../ethernet/qlogic/netxen/netxen_nic_ethtool.c    |   18 ++++++++++--------
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic.h b/drivers/net/ethernet/qlogic/netxen/netxen_nic.h
index b5de8a7..ecc8d17 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic.h
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic.h
@@ -1201,6 +1201,9 @@ typedef struct {
 #define NX_FORCE_FW_RESET               0xdeaddead
 
 
+/* Fw dump levels */
+static const u32 FW_DUMP_LEVELS[] = { 0x3, 0x7, 0xf, 0x1f, 0x3f, 0x7f, 0xff };
+
 /* Flash read/write address */
 #define NX_FW_DUMP_REG1         0x00130060
 #define NX_FW_DUMP_REG2         0x001e0000
diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_ethtool.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_ethtool.c
index 8c39299..3973040 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_ethtool.c
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_ethtool.c
@@ -834,7 +834,7 @@ netxen_get_dump_flag(struct net_device *netdev, struct ethtool_dump *dump)
 static int
 netxen_set_dump(struct net_device *netdev, struct ethtool_dump *val)
 {
-	int ret = 0;
+	int i;
 	struct netxen_adapter *adapter = netdev_priv(netdev);
 	struct netxen_minidump *mdump = &adapter->mdump;
 
@@ -844,7 +844,7 @@ netxen_set_dump(struct net_device *netdev, struct ethtool_dump *val)
 			mdump->md_enabled = 1;
 		if (adapter->fw_mdump_rdy) {
 			netdev_info(netdev, "Previous dump not cleared, not forcing dump\n");
-			return ret;
+			return 0;
 		}
 		netdev_info(netdev, "Forcing a fw dump\n");
 		nx_dev_request_reset(adapter);
@@ -867,19 +867,21 @@ netxen_set_dump(struct net_device *netdev, struct ethtool_dump *val)
 		adapter->flags &= ~NETXEN_FW_RESET_OWNER;
 		break;
 	default:
-		if (val->flag <= NX_DUMP_MASK_MAX &&
-			val->flag >= NX_DUMP_MASK_MIN) {
-			mdump->md_capture_mask = val->flag & 0xff;
-			netdev_info(netdev, "Driver mask changed to: 0x%x\n",
+		for (i = 0; i < ARRAY_SIZE(FW_DUMP_LEVELS); i++) {
+			if (val->flag == FW_DUMP_LEVELS[i]) {
+				mdump->md_capture_mask = val->flag;
+				netdev_info(netdev,
+					"Driver mask changed to: 0x%x\n",
 					mdump->md_capture_mask);
-			break;
+				return 0;
+			}
 		}
 		netdev_info(netdev,
 			"Invalid dump level: 0x%x\n", val->flag);
 		return -EINVAL;
 	}
 
-	return ret;
+	return 0;
 }
 
 static int
-- 
1.6.3.3

^ permalink raw reply related

* [PATCHv2 net-next 1/4] netxen_nic: disable minidump by default
From: Rajesh Borundia @ 2012-05-09 15:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, Sritej Velaga
In-Reply-To: <1336578930-25141-1-git-send-email-rajesh.borundia@qlogic.com>

From: Sritej Velaga <sritej.velaga@qlogic.com>

disable fw dump by default at start up.

Signed-off-by: Sritej Velaga <sritej.velaga@qlogic.com>
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
---
 .../net/ethernet/qlogic/netxen/netxen_nic_ctx.c    |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_ctx.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_ctx.c
index f3c0057..c86ea12 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_ctx.c
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_ctx.c
@@ -229,7 +229,7 @@ netxen_setup_minidump(struct netxen_adapter *adapter)
 				adapter->mdump.md_template;
 	adapter->mdump.md_capture_buff = NULL;
 	adapter->mdump.fw_supports_md = 1;
-	adapter->mdump.md_enabled = 1;
+	adapter->mdump.md_enabled = 0;
 
 	return err;
 
-- 
1.6.3.3

^ permalink raw reply related

* [PATCHv2 net-next 0/4] netxen fixes
From: Rajesh Borundia @ 2012-05-09 15:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman

Please apply it to net-next.

Thanks,
Rajesh

^ permalink raw reply

* [PATCHv2 net-next 4/4] netxen_nic: Fix estimation of recv MSS in case of LRO
From: Rajesh Borundia @ 2012-05-09 15:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman
In-Reply-To: <1336578930-25141-1-git-send-email-rajesh.borundia@qlogic.com>

o Linux stack estimates MSS from skb->len or skb_shinfo(skb)->gso_size.
In case of LRO skb->len is aggregate of len of number of packets hence MSS
obtained using skb->len would be incorrect. Incorrect estimation of recv MSS
would lead to delayed acks in some traffic patterns (which sends two or three
packets and wait for ack and only then send remaining packets). This leads to
drop in performance. Hence we need to set gso_size to MSS obtained from firmware.

o This is fixed recently in firmware hence the MSS is obtained based on
capability. If fw is capable of sending the MSS then only driver sets the gso_size.

Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
---
 drivers/net/ethernet/qlogic/netxen/netxen_nic.h    |   10 ++++++++--
 .../net/ethernet/qlogic/netxen/netxen_nic_ctx.c    |    3 +++
 .../net/ethernet/qlogic/netxen/netxen_nic_hdr.h    |    1 +
 .../net/ethernet/qlogic/netxen/netxen_nic_init.c   |    4 +++-
 .../net/ethernet/qlogic/netxen/netxen_nic_main.c   |    9 ++++++++-
 5 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic.h b/drivers/net/ethernet/qlogic/netxen/netxen_nic.h
index 11b4922..37ccbe5 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic.h
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic.h
@@ -53,8 +53,8 @@
 
 #define _NETXEN_NIC_LINUX_MAJOR 4
 #define _NETXEN_NIC_LINUX_MINOR 0
-#define _NETXEN_NIC_LINUX_SUBVERSION 78
-#define NETXEN_NIC_LINUX_VERSIONID  "4.0.78"
+#define _NETXEN_NIC_LINUX_SUBVERSION 79
+#define NETXEN_NIC_LINUX_VERSIONID  "4.0.79"
 
 #define NETXEN_VERSION_CODE(a, b, c)	(((a) << 24) + ((b) << 16) + (c))
 #define _major(v)	(((v) >> 24) & 0xff)
@@ -419,6 +419,8 @@ struct rcv_desc {
 	(((sts_data) >> 52) & 0x1)
 #define netxen_get_lro_sts_seq_number(sts_data)		\
 	((sts_data) & 0x0FFFFFFFF)
+#define netxen_get_lro_sts_mss(sts_data1)		\
+	((sts_data1 >> 32) & 0x0FFFF)
 
 
 struct status_desc {
@@ -794,6 +796,7 @@ struct netxen_cmd_args {
 #define NX_CAP0_JUMBO_CONTIGUOUS	NX_CAP_BIT(0, 7)
 #define NX_CAP0_LRO_CONTIGUOUS		NX_CAP_BIT(0, 8)
 #define NX_CAP0_HW_LRO			NX_CAP_BIT(0, 10)
+#define NX_CAP0_HW_LRO_MSS		NX_CAP_BIT(0, 21)
 
 /*
  * Context state
@@ -1073,6 +1076,8 @@ typedef struct {
 #define NX_FW_CAPABILITY_FVLANTX		(1 << 9)
 #define NX_FW_CAPABILITY_HW_LRO			(1 << 10)
 #define NX_FW_CAPABILITY_GBE_LINK_CFG		(1 << 11)
+#define NX_FW_CAPABILITY_MORE_CAPS		(1 << 31)
+#define NX_FW_CAPABILITY_2_LRO_MAX_TCP_SEG	(1 << 2)
 
 /* module types */
 #define LINKEVENT_MODULE_NOT_PRESENT			1
@@ -1155,6 +1160,7 @@ typedef struct {
 #define NETXEN_NIC_BRIDGE_ENABLED       0X10
 #define NETXEN_NIC_DIAG_ENABLED		0x20
 #define NETXEN_FW_RESET_OWNER           0x40
+#define NETXEN_FW_MSS_CAP	        0x80
 #define NETXEN_IS_MSI_FAMILY(adapter) \
 	((adapter)->flags & (NETXEN_NIC_MSI_ENABLED | NETXEN_NIC_MSIX_ENABLED))
 
diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_ctx.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_ctx.c
index c86ea12..7f556a8 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_ctx.c
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_ctx.c
@@ -328,6 +328,9 @@ nx_fw_cmd_create_rx_ctx(struct netxen_adapter *adapter)
 	cap = (NX_CAP0_LEGACY_CONTEXT | NX_CAP0_LEGACY_MN);
 	cap |= (NX_CAP0_JUMBO_CONTIGUOUS | NX_CAP0_LRO_CONTIGUOUS);
 
+	if (adapter->flags & NETXEN_FW_MSS_CAP)
+		cap |= NX_CAP0_HW_LRO_MSS;
+
 	prq->capabilities[0] = cpu_to_le32(cap);
 	prq->host_int_crb_mode =
 		cpu_to_le32(NX_HOST_INT_CRB_MODE_SHARED);
diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_hdr.h b/drivers/net/ethernet/qlogic/netxen/netxen_nic_hdr.h
index a41106b..28e0769 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_hdr.h
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_hdr.h
@@ -776,6 +776,7 @@ enum {
 #define CRB_SW_INT_MASK_3		(NETXEN_NIC_REG(0x1e8))
 
 #define CRB_FW_CAPABILITIES_1		(NETXEN_CAM_RAM(0x128))
+#define CRB_FW_CAPABILITIES_2		(NETXEN_CAM_RAM(0x12c))
 #define CRB_MAC_BLOCK_START		(NETXEN_CAM_RAM(0x1c0))
 
 /*
diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c
index 718b274..0d725dc 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c
@@ -1131,7 +1131,6 @@ netxen_validate_firmware(struct netxen_adapter *adapter)
 		 _build(file_fw_ver));
 		return -EINVAL;
 	}
-
 	val = nx_get_bios_version(adapter);
 	netxen_rom_fast_read(adapter, NX_BIOS_VERSION_OFFSET, (int *)&bios);
 	if ((__force u32)val != bios) {
@@ -1661,6 +1660,9 @@ netxen_process_lro(struct netxen_adapter *adapter,
 
 	length = skb->len;
 
+	if (adapter->flags & NETXEN_FW_MSS_CAP)
+		skb_shinfo(skb)->gso_size  =  netxen_get_lro_sts_mss(sts_data1);
+
 	netif_receive_skb(skb);
 
 	adapter->stats.lro_pkts++;
diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
index d03619c..342b3a7 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
@@ -1184,6 +1184,7 @@ netxen_nic_attach(struct netxen_adapter *adapter)
 	int err, ring;
 	struct nx_host_rds_ring *rds_ring;
 	struct nx_host_tx_ring *tx_ring;
+	u32 capab2;
 
 	if (adapter->is_up == NETXEN_ADAPTER_UP_MAGIC)
 		return 0;
@@ -1192,6 +1193,13 @@ netxen_nic_attach(struct netxen_adapter *adapter)
 	if (err)
 		return err;
 
+	adapter->flags &= ~NETXEN_FW_MSS_CAP;
+	if (adapter->capabilities & NX_FW_CAPABILITY_MORE_CAPS) {
+		capab2 = NXRD32(adapter, CRB_FW_CAPABILITIES_2);
+		if (capab2 & NX_FW_CAPABILITY_2_LRO_MAX_TCP_SEG)
+			adapter->flags |= NETXEN_FW_MSS_CAP;
+	}
+
 	err = netxen_napi_add(adapter, netdev);
 	if (err)
 		return err;
@@ -1810,7 +1818,6 @@ netxen_tso_check(struct net_device *netdev,
 		flags = FLAGS_VLAN_TAGGED;
 
 	} else if (vlan_tx_tag_present(skb)) {
-
 		flags = FLAGS_VLAN_OOB;
 		vid = vlan_tx_tag_get(skb);
 		netxen_set_tx_vlan_tci(first_desc, vid);
-- 
1.6.3.3

^ permalink raw reply related

* [PATCHv2 net-next 3/4] netxen: added miniDIMM support in driver.
From: Rajesh Borundia @ 2012-05-09 15:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, Sucheta Chakraborty
In-Reply-To: <1336578930-25141-1-git-send-email-rajesh.borundia@qlogic.com>

From: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>

Driver queries DIMM information from firmware and accordingly
sets "presence" field of the structure.
"presence" field when set to 0xff denotes invalid flag. And when
set to 0x0 denotes DIMM memory is not present.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
---
 drivers/net/ethernet/qlogic/netxen/netxen_nic.h    |    7 +
 .../net/ethernet/qlogic/netxen/netxen_nic_hdr.h    |   25 ++++
 .../net/ethernet/qlogic/netxen/netxen_nic_main.c   |  131 ++++++++++++++++++++
 3 files changed, 163 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic.h b/drivers/net/ethernet/qlogic/netxen/netxen_nic.h
index ecc8d17..11b4922 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic.h
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic.h
@@ -1817,6 +1817,13 @@ struct netxen_brdinfo {
 	char short_name[NETXEN_MAX_SHORT_NAME];
 };
 
+struct netxen_dimm_cfg {
+	u8 presence;
+	u8 mem_type;
+	u8 dimm_type;
+	u32 size;
+};
+
 static const struct netxen_brdinfo netxen_boards[] = {
 	{NETXEN_BRDTYPE_P2_SB31_10G_CX4, 1, "XGb CX4"},
 	{NETXEN_BRDTYPE_P2_SB31_10G_HMEZ, 1, "XGb HMEZ"},
diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_hdr.h b/drivers/net/ethernet/qlogic/netxen/netxen_nic_hdr.h
index b1a897c..a41106b 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_hdr.h
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_hdr.h
@@ -955,6 +955,31 @@ enum {
 #define NX_CRB_DEV_REF_COUNT		(NETXEN_CAM_RAM(0x138))
 #define NX_CRB_DEV_STATE		(NETXEN_CAM_RAM(0x140))
 
+/* MiniDIMM related macros */
+#define NETXEN_DIMM_CAPABILITY		(NETXEN_CAM_RAM(0x258))
+#define NETXEN_DIMM_PRESENT			0x1
+#define NETXEN_DIMM_MEMTYPE_DDR2_SDRAM	0x2
+#define NETXEN_DIMM_SIZE			0x4
+#define NETXEN_DIMM_MEMTYPE(VAL)		((VAL >> 3) & 0xf)
+#define	NETXEN_DIMM_NUMROWS(VAL)		((VAL >> 7) & 0xf)
+#define	NETXEN_DIMM_NUMCOLS(VAL)		((VAL >> 11) & 0xf)
+#define	NETXEN_DIMM_NUMRANKS(VAL)		((VAL >> 15) & 0x3)
+#define NETXEN_DIMM_DATAWIDTH(VAL)		((VAL >> 18) & 0x3)
+#define NETXEN_DIMM_NUMBANKS(VAL)		((VAL >> 21) & 0xf)
+#define NETXEN_DIMM_TYPE(VAL)		((VAL >> 25) & 0x3f)
+#define NETXEN_DIMM_VALID_FLAG		0x80000000
+
+#define NETXEN_DIMM_MEM_DDR2_SDRAM	0x8
+
+#define NETXEN_DIMM_STD_MEM_SIZE	512
+
+#define NETXEN_DIMM_TYPE_RDIMM	0x1
+#define NETXEN_DIMM_TYPE_UDIMM	0x2
+#define NETXEN_DIMM_TYPE_SO_DIMM	0x4
+#define NETXEN_DIMM_TYPE_Micro_DIMM	0x8
+#define NETXEN_DIMM_TYPE_Mini_RDIMM	0x10
+#define NETXEN_DIMM_TYPE_Mini_UDIMM	0x20
+
 /* Device State */
 #define NX_DEV_COLD		1
 #define NX_DEV_INITALIZING	2
diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
index 65a718f..d03619c 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
@@ -2926,6 +2926,134 @@ static struct bin_attribute bin_attr_mem = {
 	.write = netxen_sysfs_write_mem,
 };
 
+static ssize_t
+netxen_sysfs_read_dimm(struct file *filp, struct kobject *kobj,
+		struct bin_attribute *attr,
+		char *buf, loff_t offset, size_t size)
+{
+	struct device *dev = container_of(kobj, struct device, kobj);
+	struct netxen_adapter *adapter = dev_get_drvdata(dev);
+	struct net_device *netdev = adapter->netdev;
+	struct netxen_dimm_cfg dimm;
+	u8 dw, rows, cols, banks, ranks;
+	u32 val;
+
+	if (size != sizeof(struct netxen_dimm_cfg)) {
+		netdev_err(netdev, "Invalid size\n");
+		return -1;
+	}
+
+	memset(&dimm, 0, sizeof(struct netxen_dimm_cfg));
+	val = NXRD32(adapter, NETXEN_DIMM_CAPABILITY);
+
+	/* Checks if DIMM info is valid. */
+	if (val & NETXEN_DIMM_VALID_FLAG) {
+		netdev_err(netdev, "Invalid DIMM flag\n");
+		dimm.presence = 0xff;
+		goto out;
+	}
+
+	rows = NETXEN_DIMM_NUMROWS(val);
+	cols = NETXEN_DIMM_NUMCOLS(val);
+	ranks = NETXEN_DIMM_NUMRANKS(val);
+	banks = NETXEN_DIMM_NUMBANKS(val);
+	dw = NETXEN_DIMM_DATAWIDTH(val);
+
+	dimm.presence = (val & NETXEN_DIMM_PRESENT);
+
+	/* Checks if DIMM info is present. */
+	if (!dimm.presence) {
+		netdev_err(netdev, "DIMM not present\n");
+		goto out;
+	}
+
+	dimm.dimm_type = NETXEN_DIMM_TYPE(val);
+
+	switch (dimm.dimm_type) {
+	case NETXEN_DIMM_TYPE_RDIMM:
+	case NETXEN_DIMM_TYPE_UDIMM:
+	case NETXEN_DIMM_TYPE_SO_DIMM:
+	case NETXEN_DIMM_TYPE_Micro_DIMM:
+	case NETXEN_DIMM_TYPE_Mini_RDIMM:
+	case NETXEN_DIMM_TYPE_Mini_UDIMM:
+		break;
+	default:
+		netdev_err(netdev, "Invalid DIMM type %x\n", dimm.dimm_type);
+		goto out;
+	}
+
+	if (val & NETXEN_DIMM_MEMTYPE_DDR2_SDRAM)
+		dimm.mem_type = NETXEN_DIMM_MEM_DDR2_SDRAM;
+	else
+		dimm.mem_type = NETXEN_DIMM_MEMTYPE(val);
+
+	if (val & NETXEN_DIMM_SIZE) {
+		dimm.size = NETXEN_DIMM_STD_MEM_SIZE;
+		goto out;
+	}
+
+	if (!rows) {
+		netdev_err(netdev, "Invalid no of rows %x\n", rows);
+		goto out;
+	}
+
+	if (!cols) {
+		netdev_err(netdev, "Invalid no of columns %x\n", cols);
+		goto out;
+	}
+
+	if (!banks) {
+		netdev_err(netdev, "Invalid no of banks %x\n", banks);
+		goto out;
+	}
+
+	ranks += 1;
+
+	switch (dw) {
+	case 0x0:
+		dw = 32;
+		break;
+	case 0x1:
+		dw = 33;
+		break;
+	case 0x2:
+		dw = 36;
+		break;
+	case 0x3:
+		dw = 64;
+		break;
+	case 0x4:
+		dw = 72;
+		break;
+	case 0x5:
+		dw = 80;
+		break;
+	case 0x6:
+		dw = 128;
+		break;
+	case 0x7:
+		dw = 144;
+		break;
+	default:
+		netdev_err(netdev, "Invalid data-width %x\n", dw);
+		goto out;
+	}
+
+	dimm.size = ((1 << rows) * (1 << cols) * dw * banks * ranks) / 8;
+	/* Size returned in MB. */
+	dimm.size = (dimm.size) / 0x100000;
+out:
+	memcpy(buf, &dimm, sizeof(struct netxen_dimm_cfg));
+	return sizeof(struct netxen_dimm_cfg);
+
+}
+
+static struct bin_attribute bin_attr_dimm = {
+	.attr = { .name = "dimm", .mode = (S_IRUGO | S_IWUSR) },
+	.size = 0,
+	.read = netxen_sysfs_read_dimm,
+};
+
 
 static void
 netxen_create_sysfs_entries(struct netxen_adapter *adapter)
@@ -2963,6 +3091,8 @@ netxen_create_diag_entries(struct netxen_adapter *adapter)
 		dev_info(dev, "failed to create crb sysfs entry\n");
 	if (device_create_bin_file(dev, &bin_attr_mem))
 		dev_info(dev, "failed to create mem sysfs entry\n");
+	if (device_create_bin_file(dev, &bin_attr_dimm))
+		dev_info(dev, "failed to create dimm sysfs entry\n");
 }
 
 
@@ -2975,6 +3105,7 @@ netxen_remove_diag_entries(struct netxen_adapter *adapter)
 	device_remove_file(dev, &dev_attr_diag_mode);
 	device_remove_bin_file(dev, &bin_attr_crb);
 	device_remove_bin_file(dev, &bin_attr_mem);
+	device_remove_bin_file(dev, &bin_attr_dimm);
 }
 
 #ifdef CONFIG_INET
-- 
1.6.3.3

^ permalink raw reply related

* RE: [net-next 2/8] e1000e: initial support for i217
From: Allan, Bruce W @ 2012-05-09 16:25 UTC (permalink / raw)
  To: Bjørn Mork, Kirsher, Jeffrey T
  Cc: davem@davemloft.net, netdev@vger.kernel.org, gospo@redhat.com,
	sassmann@redhat.com
In-Reply-To: <87mx5njbto.fsf@nemi.mork.no>

> -----Original Message-----
> From: Bjørn Mork [mailto:bjorn@mork.no]
> Sent: Saturday, May 05, 2012 1:02 AM
> To: Kirsher, Jeffrey T
> Cc: davem@davemloft.net; Allan, Bruce W; netdev@vger.kernel.org;
> gospo@redhat.com; sassmann@redhat.com
> Subject: Re: [net-next 2/8] e1000e: initial support for i217
> 
> Jeff Kirsher <jeffrey.t.kirsher@intel.com> writes:
> 
> > diff --git a/drivers/net/ethernet/intel/e1000e/defines.h
> b/drivers/net/ethernet/intel/e1000e/defines.h
> > index 3a50259..11c4666 100644
> > --- a/drivers/net/ethernet/intel/e1000e/defines.h
> > +++ b/drivers/net/ethernet/intel/e1000e/defines.h
> > @@ -74,7 +74,9 @@
> >  #define E1000_WUS_BC           E1000_WUFC_BC
> >
> >  /* Extended Device Control */
> > +#define E1000_CTRL_EXT_LPCD  0x00000004     /* LCD Power Cycle Done
> */
> >  #define E1000_CTRL_EXT_SDP3_DATA 0x00000080 /* Value of SW
> Definable Pin 3 */
> > +#define E1000_CTRL_EXT_FORCE_SMBUS 0x00000004 /* Force SMBus mode*/
> >  #define E1000_CTRL_EXT_EE_RST    0x00002000 /* Reinitialize from
> EEPROM */
> >  #define E1000_CTRL_EXT_SPD_BYPS  0x00008000 /* Speed Select Bypass
> */
> >  #define E1000_CTRL_EXT_RO_DIS    0x00020000 /* Relaxed Ordering
> disable */
> 
> The mangled sorting and alignment of the new entries made me wonder if
> this was a typo.  But reading further below it looks like
> E1000_CTRL_EXT_LPCD is input and E1000_CTRL_EXT_FORCE_SMBUS is output.
> If that is correct, then it probably deserves a small comment here
> along
> with better sorting and alignment to make it clear that the duplicate
> value is intentional?
> 
> 
> Bjørn

It is a typo.  An updated patch will follow soon.

Thanks for spotting that,
Bruce.

^ permalink raw reply

* [3.2.y] phy:icplus:fix Auto Power Saving in ip101a_config_init.
From: Jonathan Nieder @ 2012-05-09 17:18 UTC (permalink / raw)
  To: stable
  Cc: David Miller, srinivas.kandagatla, netdev, peppe.cavallaro,
	Ben Hutchings
In-Reply-To: <20120509.115729.1982766903625560504.davem@davemloft.net>

From: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Date: Mon, 2 Apr 2012 00:02:09 +0000

[ Upstream commit b3300146aa8efc5d3937fd33f3cfdc580a3843bc ]

This patch fixes Auto Power Saving configuration in ip101a_config_init
which was broken as there is no phy register write followed after
setting IP101A_APS_ON flag.

This patch also fixes the return value of ip101a_config_init.

Without this patch ip101a_config_init returns 2 which is not an error
accroding to IS_ERR and the mac driver will continue accessing 2 as
valid pointer to phy_dev resulting in memory fault.

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
David Miller wrote:
> From: Jonathan Nieder <jrnieder@gmail.com>

>> I'm guessing 3.2.y needs this, too, since it also includes
>> v3.2-rc1~129^2~247 ("net/phy: add IC+ IP101A and support APS").
>>
>> Here's a quick backport made by adjusting the context line to
>> use IP101A_APS_ON instead of IP101A_G_APS_ON.
>>
>> Please include it in some later stable update if appropriate.
>
> Please submit this directly to the stable list.

Thanks for looking it over.  Doing so.

 drivers/net/phy/icplus.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/icplus.c b/drivers/net/phy/icplus.c
index c81f136ae670..b14230016d9b 100644
--- a/drivers/net/phy/icplus.c
+++ b/drivers/net/phy/icplus.c
@@ -150,7 +150,8 @@ static int ip101a_config_init(struct phy_device *phydev)
 	/* Enable Auto Power Saving mode */
 	c = phy_read(phydev, IP10XX_SPEC_CTRL_STATUS);
 	c |= IP101A_APS_ON;
-	return c;
+
+	return phy_write(phydev, IP10XX_SPEC_CTRL_STATUS, c);
 }
 
 static int ip175c_read_status(struct phy_device *phydev)
-- 
1.7.10.1

^ permalink raw reply related

* [PATCH] pktgen: fix crash at module unload
From: Eric Dumazet @ 2012-05-09 19:21 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Eric W. Biederman

From: Eric Dumazet <edumazet@google.com>

commit 7d3d43dab4e9 (net: In unregister_netdevice_notifier unregister
the netdevices.) makes pktgen crashing at module unload.

[  296.820578] BUG: spinlock bad magic on CPU#6, rmmod/3267
[  296.820719]  lock: ffff880310c38000, .magic: ffff8803, .owner: <none>/-1, .owner_cpu: -1
[  296.820943] Pid: 3267, comm: rmmod Not tainted 3.4.0-rc5+ #254
[  296.821079] Call Trace:
[  296.821211]  [<ffffffff8168a715>] spin_dump+0x8a/0x8f
[  296.821345]  [<ffffffff8168a73b>] spin_bug+0x21/0x26
[  296.821507]  [<ffffffff812b4741>] do_raw_spin_lock+0x131/0x140
[  296.821648]  [<ffffffff8169188e>] _raw_spin_lock+0x1e/0x20
[  296.821786]  [<ffffffffa00cc0fd>] __pktgen_NN_threads+0x4d/0x140 [pktgen]
[  296.821928]  [<ffffffffa00ccf8d>] pktgen_device_event+0x10d/0x1e0 [pktgen]
[  296.822073]  [<ffffffff8154ed4f>] unregister_netdevice_notifier+0x7f/0x100
[  296.822216]  [<ffffffffa00d2a0b>] pg_cleanup+0x48/0x73 [pktgen]
[  296.822357]  [<ffffffff8109528e>] sys_delete_module+0x17e/0x2a0
[  296.822502]  [<ffffffff81699652>] system_call_fastpath+0x16/0x1b

Fix this by deleting objects from pktgen_threads list before their
freeing.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 net/core/pktgen.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index ffb5d38..f632abf 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -3762,6 +3762,7 @@ static void __exit pg_cleanup(void)
 	list_for_each_safe(q, n, &pktgen_threads) {
 		t = list_entry(q, struct pktgen_thread, th_list);
 		kthread_stop(t->tsk);
+		list_del(&t->th_list);
 		kfree(t);
 	}
 

^ permalink raw reply related

* Re: [PATCH] pktgen: fix crash at module unload
From: Eric W. Biederman @ 2012-05-09 21:23 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1336591268.12504.62.camel@edumazet-glaptop>

Eric Dumazet <eric.dumazet@gmail.com> writes:

> From: Eric Dumazet <edumazet@google.com>
>
> commit 7d3d43dab4e9 (net: In unregister_netdevice_notifier unregister
> the netdevices.) makes pktgen crashing at module unload.

Awesome an anceint race was flushed out.

> [  296.820578] BUG: spinlock bad magic on CPU#6, rmmod/3267
> [  296.820719]  lock: ffff880310c38000, .magic: ffff8803, .owner: <none>/-1, .owner_cpu: -1
> [  296.820943] Pid: 3267, comm: rmmod Not tainted 3.4.0-rc5+ #254
> [  296.821079] Call Trace:
> [  296.821211]  [<ffffffff8168a715>] spin_dump+0x8a/0x8f
> [  296.821345]  [<ffffffff8168a73b>] spin_bug+0x21/0x26
> [  296.821507]  [<ffffffff812b4741>] do_raw_spin_lock+0x131/0x140
> [  296.821648]  [<ffffffff8169188e>] _raw_spin_lock+0x1e/0x20
> [  296.821786]  [<ffffffffa00cc0fd>] __pktgen_NN_threads+0x4d/0x140 [pktgen]
> [  296.821928]  [<ffffffffa00ccf8d>] pktgen_device_event+0x10d/0x1e0 [pktgen]
> [  296.822073]  [<ffffffff8154ed4f>] unregister_netdevice_notifier+0x7f/0x100
> [  296.822216]  [<ffffffffa00d2a0b>] pg_cleanup+0x48/0x73 [pktgen]
> [  296.822357]  [<ffffffff8109528e>] sys_delete_module+0x17e/0x2a0
> [  296.822502]  [<ffffffff81699652>] system_call_fastpath+0x16/0x1b
>
> Fix this by deleting objects from pktgen_threads list before their
> freeing.

That seems reasonable.

Would it be easier to call unregister_netdevice_notifer before shutting
down the threads, so you don't have the weird cases to deal with during
shutdown?

It looks like pg_cleanup doesn't take the pktgen_thread_lock, so
I suspect that there are still races.


Eric


> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  net/core/pktgen.c |    1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/net/core/pktgen.c b/net/core/pktgen.c
> index ffb5d38..f632abf 100644
> --- a/net/core/pktgen.c
> +++ b/net/core/pktgen.c
> @@ -3762,6 +3762,7 @@ static void __exit pg_cleanup(void)
>  	list_for_each_safe(q, n, &pktgen_threads) {
>  		t = list_entry(q, struct pktgen_thread, th_list);
>  		kthread_stop(t->tsk);
> +		list_del(&t->th_list);
>  		kfree(t);
>  	}
>  

^ permalink raw reply

* [PATCH] netfilter/xt_CT.c: remove redundant header include
From: Eldad Zack @ 2012-05-09 22:03 UTC (permalink / raw)
  To: David S. Miller, Pablo Neira Ayuso, Patrick McHardy
  Cc: netfilter-devel, netfilter, coreteam, netdev, linux-kernel,
	Eldad Zack

nf_conntrack_l4proto.h is included twice.

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
---
 net/netfilter/xt_CT.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/net/netfilter/xt_CT.c b/net/netfilter/xt_CT.c
index 3746d8b..a51de9b 100644
--- a/net/netfilter/xt_CT.c
+++ b/net/netfilter/xt_CT.c
@@ -17,7 +17,6 @@
 #include <net/netfilter/nf_conntrack_l4proto.h>
 #include <net/netfilter/nf_conntrack_helper.h>
 #include <net/netfilter/nf_conntrack_ecache.h>
-#include <net/netfilter/nf_conntrack_l4proto.h>
 #include <net/netfilter/nf_conntrack_timeout.h>
 #include <net/netfilter/nf_conntrack_zones.h>
 
-- 
1.7.10


^ permalink raw reply related

* Re: [PATCH 0/5] netfilter updates for net-next (upcoming 3.5), batch 2
From: David Miller @ 2012-05-09 22:11 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <1336563188-6720-1-git-send-email-pablo@netfilter.org>

From: pablo@netfilter.org
Date: Wed,  9 May 2012 13:33:03 +0200

> This is a second batch of netfilter updates for net-next, they contain:
> 
> * The new HMARK target from Hans Schillstrom. It took lots of spins
>   to get this into shape. This target provides a hash-based packet / flow
>   pre-classifier for iptables that can be used to distribute packets
>   / flows between uplinks and backend servers. It provides to modes, one
>   that relies on conntrack, and one that is stateless per-packet.
> 
> * Byte-based cost calculation for the hashlimit match, to detect when
>   a host consumes more bandwidth than expected. This patch from Florian
>   Westphal.
> 
> You can pull these changes from:
> 
> git://1984.lsi.us.es/net-next

Pulled.

Two suggested improvements:

1) The HMARK hash is quite expensive, because it uses a modulus.

   Consider adjusting it to use the usual trick:

	((u64)(HASH_VAL * HASH_SIZE)) >> 32

   so that this can be a multiply instead of a modulus.

2) XT_HASHLIMIT_MAX is cumbersome.

   The canonical way to validate if the set bits are in	a
   valid range is to have a "_ALL" macro, and test:

	if (val & ~XT_HASHLIMIT_ALL)
		goto err;

   or similar.

Thanks.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox