Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 4/7] netprio_cgroup: reimplement priomap expansion
From: Daniel Wagner @ 2012-11-20 15:16 UTC (permalink / raw)
  To: Tejun Heo
  Cc: serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	tgraf-G/eBtMaohhA, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	lizefan-hv44wF8Li93QT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20121120151322.GR15971-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>

On 20.11.2012 16:13, Tejun Heo wrote:
> Hello,
>
> On Tue, Nov 20, 2012 at 04:09:22PM +0100, Daniel Wagner wrote:
>> Thanks for the explanation. I was pondering if the new size in power
>> of two could be a bit too excessive and the allocation step could be
>> linear, e.g. stick at 4096. target_id will increase linear,
>> therefore linear increase might also be enough, no?
>
> Well, power-of-two resizing tends to behave relatively well under most
> cases and slab allocations are binned by power-of-two sizes, so
> linearly growing it doesn't really buy anything.

Convinced :)

Tested and Acked-by: Daniel Wagner <daniel.wagner-98C5kh4wR6ohFhg+JK9F0w@public.gmane.org>

thanks,
daniel

^ permalink raw reply

* Re: [PATCH 4/7] netprio_cgroup: reimplement priomap expansion
From: Tejun Heo @ 2012-11-20 15:13 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: nhorman-2XuSBdqkA4R54TAoqtyWWQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, tgraf-G/eBtMaohhA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <50AB9D22.5030000-kQCPcA+X3s7YtjvyW6yDsg@public.gmane.org>

Hello,

On Tue, Nov 20, 2012 at 04:09:22PM +0100, Daniel Wagner wrote:
> Thanks for the explanation. I was pondering if the new size in power
> of two could be a bit too excessive and the allocation step could be
> linear, e.g. stick at 4096. target_id will increase linear,
> therefore linear increase might also be enough, no?

Well, power-of-two resizing tends to behave relatively well under most
cases and slab allocations are binned by power-of-two sizes, so
linearly growing it doesn't really buy anything.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH 7/7] netprio_cgroup: allow nesting and inherit config on cgroup creation
From: Daniel Wagner @ 2012-11-20 15:10 UTC (permalink / raw)
  To: Tejun Heo
  Cc: nhorman-2XuSBdqkA4R54TAoqtyWWQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, tgraf-G/eBtMaohhA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <20121120144036.GP15971-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>

Hi Tejun,

On 20.11.2012 15:40, Tejun Heo wrote:
> Hello, Daniel.
>
> On Tue, Nov 20, 2012 at 09:57:14AM +0100, Daniel Wagner wrote:
>>> -static void cgrp_css_free(struct cgroup *cgrp)
>>> +static int cgrp_css_online(struct cgroup *cgrp)
>>>   {
>>> -	struct cgroup_netprio_state *cs = cgrp_netprio_state(cgrp);
>>> +	struct cgroup *parent = cgrp->parent;
>>>   	struct net_device *dev;
>>> +	int ret = 0;
>>> +
>>> +	if (!parent)
>>> +		return 0;
>>
>> BTW, parent is always != NULL, because the root cgroup will be
>> attached to the dummytop cgroup.
>
> Hmmm?  I'm confused.  When ->css_online() is called for dummytop in
> cgroup_init_subsys(), its parent is NULL.  What am I missing?

Forget my comment, I was really confused when writing this. I was 
looking only at cgroups which were created after bootup.

cheers,
daniel

^ permalink raw reply

* Re: [PATCH 4/7] netprio_cgroup: reimplement priomap expansion
From: Daniel Wagner @ 2012-11-20 15:09 UTC (permalink / raw)
  To: Tejun Heo
  Cc: serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	tgraf-G/eBtMaohhA, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	lizefan-hv44wF8Li93QT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20121120143832.GO15971-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>

On 20.11.2012 15:38, Tejun Heo wrote:
> Hello, Daniel.
>
> On Tue, Nov 20, 2012 at 09:46:22AM +0100, Daniel Wagner wrote:
>> struct netprio_map {
>> 	struct rcu_head rcu;
>> 	struct netprio_aux *aux;	/* auxiliary config array */
>> 	u32 priomap_len;
>> 	u32 priomap[];
>> };
>>
>> Is there a specific reason why aux and priomap is handled
>> differently? Couldn't you just use same approach for both variables,
>> e.g. re/allocating only them here and leave the allocation struct
>> netprio_map in cgrp_css_alloc()?
>
> ->aux is no longer added, so the consistency issue doesn't exist
> anymore.

Right, I got confused looking at v1 and v2.

> The reason why they were handled differently before (or
> rather why I didn't change priomap[] to be allocated separately) was
> that pointer chasing tends to be more expensive than offsetting.  I
> don't know how much effect it would have in this case but things
> sitting in packet in/out paths can be very hot so didn't wanna disturb
> it.

I see.

>> Also the algorithm to figure out the size of the array might be a
>> bit too aggressive in my opinion. So you always start at
>> PRIOMAP_MIN_SIZE and then try to double the size until target_idx
>> fits. Wouldn't it make sense to start to look for the new size
>> beginning at old->priomap_len and then do the power-of-two increase?
>
> The only downside of always starting from PRIOMAP_MIN_SIZE is
> iterating several more times in the sizing loop which isn't really
> anything to worry about.  The loop is structured that way because I
> wanted to keep the size of the whole thing power-of-two.  Due to the
> fields before priomap[], if we size priomap_len power-of-two, we'll
> always end up with something slightly over power-of-two, which is
> usually the worst size to allocate.

Thanks for the explanation. I was pondering if the new size in power of 
two could be a bit too excessive and the allocation step could be 
linear, e.g. stick at 4096. target_id will increase linear, therefore 
linear increase might also be enough, no?

cheers,
daniel

^ permalink raw reply

* Re: [PATCH v2] checkpatch: add double empty line check
From: Eilon Greenstein @ 2012-11-20 15:07 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: Joe Perches, David Rientjes, linux-kernel, netdev
In-Reply-To: <20121120144329.GE7955@dm>

On Tue, 2012-11-20 at 14:43 +0000, Andy Whitcroft wrote:

> > > Also this fails if the fragment
> > > is at the top of the hunk emiting a perl warning.
> > 
> > I did not see this warning. Can you please share this example? I tried
> > adding a couple of empty lines at the beginning of a file and it seemed
> > to work just fine for me (using perl v5.14.2).lines
> 
> Ok, this is actually if it is at the bottom, not the top.  So if you
> have a range of lines newly added to the bottom of the file.  Leading to
> this warning:
> 
> Use of uninitialized value within @rawlines in pattern match (m//) at
> ../checkpatch/scripts/checkpatch.pl line 3586.

Oh... of course, I should have seen that. I did not check changes at the
end of the file.

What do you say about something like that (using nextline out of
rawlines only if it defined):
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 21a9f5d..c0c610c 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3579,6 +3579,19 @@ sub process {
                        WARN("EXPORTED_WORLD_WRITABLE",
                             "Exporting world writable files is usually an error. Consider more restrictive permissions.
                }
+
+# check for double empty lines
+               if ($line =~ /^\+\s*$/) {
+                       my $nextline = "";
+                       if (defined($rawlines[$linenr])) {
+                               $nextline = $rawlines[$linenr];
+                       }
+                       if ($nextline =~ /^\s*$/ ||
+                           $prevline =~ /^\+?\s*$/ && $nextline !~ /^\+\s*$/) {
+                               CHK("DOUBLE_EMPTY_LINE",
+                                   "One empty line should be sufficient. Consider removing this one.\n" . $herecurr);
+                       }
+               }
        }
 


If you think it's appropriate, I will send a clean copy.

Thanks,
Eilon

^ permalink raw reply related

* Re: [Xen-devel] [PATCH] xen/netfront: handle compound page fragments on transmit
From: Ian Campbell @ 2012-11-20 15:06 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefan Bader, Sander Eikelenboom, Eric Dumazet,
	KonradRzeszutekWilk, xen-devel@lists.xen.org, ANNIE LI,
	netdev@vger.kernel.org
In-Reply-To: <50ABA29902000078000A9FB1@nat28.tlf.novell.com>

On Tue, 2012-11-20 at 14:32 +0000, Jan Beulich wrote:
> > @@ -517,15 +540,16 @@ static int xennet_start_xmit(struct sk_buff *skb, 
> > struct net_device *dev)
> >  	grant_ref_t ref;
> >  	unsigned long mfn;
> >  	int notify;
> > -	int frags = skb_shinfo(skb)->nr_frags;
> > +	int slots;
> >  	unsigned int offset = offset_in_page(data);
> >  	unsigned int len = skb_headlen(skb);
> >  	unsigned long flags;
> >  
> > -	frags += DIV_ROUND_UP(offset + len, PAGE_SIZE);
> > -	if (unlikely(frags > MAX_SKB_FRAGS + 1)) {
> > -		printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n",
> > -		       frags);
> > +	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
> > +		xennet_count_skb_frag_slots(skb);
> > +	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> 
> But still - isn't this wrong now (i.e. can't it now validly exceed the
> boundary checked for)?

In practice no because of the property that the number of pages backing
the frags is <= MAX_SKB_FRAGS even if you are using compound pages as
the frags.

Ian.

^ permalink raw reply

* Fwd: Network soft and hard irqs statistics
From: Javier Domingo @ 2012-11-20 15:05 UTC (permalink / raw)
  To: netdev
In-Reply-To: <CALZVapkHda-tYNJALJWjhGwFBjAet84gxJam2UoK9WzMKQE6Bw@mail.gmail.com>

I have released the mentioned code in

https://github.com/txomon/linux

It now is giving some kernel panics due to some page fault during
net_rx_action because I didn't know how to put this in current kernel,
but I am currently working in an alternative solution

https://github.com/txomon/linux/blob/affde7645451eb62cdd1993a8cef7b5325e30b96/net/core/dev.c#L3944

Hope someone can help me now :D

Javier Domingo



2012/11/15 Javier Domingo <javierdo1@gmail.com>
>
> Hello all,
>
> I am migrating some statistics we use in our research group to v3.6.
> This I don't think it will be usefull for anyone, as they measure
> softirqs, hardirqs, times on them, etc.
>
> We modified net_device structure to contain a structure that has
> several field of statistics.
>
> Patched the e1000 and tg3 drivers to measure hardirq times, and
> polling times. We also patched net_rx_action (the softirq) to check if
> we get out per budget, per jiffies and netif_receive_skb to measure
> times and how many packets are captured.
>
> At the moment, we have been working with a external module that
> accessed this vars, creating proc entries, and allowing us to reset
> those measures.
>
> Now, I am trying to make it the most standard way, with the intention
> that when I talk to my boss, he will allow me to release the code.
>
> The main aim of this is to get some feedback about the interest this
> can have and to ask a few questions:
>
> -> Where may I create the proc entry? we currently use
> /proc/net/stats/<netdev>. I have also thought introducing that entry
> in fs/proc/proc_net.c, but I am not too sure which conventions there
> are...
>
> -> When migrating the net_rx_action, I found that we used this line:
> if(cpus_equal(mask,irq_desc[timedev->irq].affinity))
> before counting if we get out by budget or by jiffies to (I suppose)
> check that the softirq was the one assigned to this processor. Is that
> needed? I mean the softirq is run in just one of them... I don't
> really understand why it is important, so if anyone can explain me, I
> would be glad.
>
> -> We have patched the hardirqs in the driver, and the polling times
> too. I know the hardirqs are the only place in which we can measure
> them, but would it be posible to, instead of measuring the polls in
> e1000_clean (for example) measuring in dev.c net_rx_action, measure
> them around n->poll() call?
>    Have been doing like this because they told me that the context
> change was important... But I am not too sure on how important it is,
> if someone could give me any tip on this.
>
> -> In tg3.c I have seen that there are several hardirq function,
> though we usually only patched tg3_interrupt_tagged, I have patched
> all of them (for what they might be). Why are so many of them? Is that
> due to preparation for multiqueue cards?
>
> I hope someone can attend my doubts, and that I dont have asked too
> many newbie questions.
>
> Best regards,
>
> Javier Domingo

^ permalink raw reply

* Re: [PATCH] xen/netfront: handle compound page fragments on transmit
From: Ian Campbell @ 2012-11-20 15:05 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev@vger.kernel.org, xen-devel@lists.xen.org, Eric Dumazet,
	Konrad Rzeszutek Wilk, ANNIE LI, Sander Eikelenboom, Stefan Bader
In-Reply-To: <1353422735.2590.6.camel@edumazet-glaptop>

On Tue, 2012-11-20 at 14:45 +0000, Eric Dumazet wrote:
> > +		/* Skip unused frames from start of page */
> 
> 'frame' in the comment means an order-0 page ?

Yes. Confusing in the context of a network driver I know! I couldn't
think of a better term.

> > +			id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs);
> > +			np->tx_skbs[id].skb = skb_get(skb);
> 
> BTW this skb_get() means extra atomic operations for every 4096 bytes
> unit, and an extra atomic op (and test for final 0) at TX completion.
> This could be avoided, by setting np->tx_skbs[id].skb = skb only for the
> very last unit.

Thanks. Might be tricky because guests can ack the individual requests
in any order but it's something worth having a look at.

> >  	np->tx.req_prod_pvt = prod;
> 
> Acked-by: Eric Dumazet <edumazet@google.com>
> 
> Thanks !

Thanks for the review.

Ian.

> 
> 

^ permalink raw reply

* Re: question about eth_header
From: Rami Rosen @ 2012-11-20 14:51 UTC (permalink / raw)
  To: Dmitry Kravkov; +Cc: netdev
In-Reply-To: <1353419044.16036.6.camel@lb-tlvb-dmitry>

Hi,
I think that you should call skb_reset_mac_header()
before calling eth_hdr() and that skb_reset_mac_header() should not be
inside eth_hdr(). Following is explanation:

In the RX path, when we get a packet in the driver, what essentially
happens is that at this point, we have a pointer to
the packet buffer in skb->data, and its length in skb->len.

With ethernet packets, the driver will call eth_type_trans(); please
look at ethernet drivers code.
eth_type_trans() indeed sets the MAC header pointer by calling
skb_reset_mac_header() on the skb.

Afterwards, the eth_type_trans() will advance skb->data by
14, the size of the Ethernet header, and decrease skb->len
by calling skb_pull_inline(skb, ETH_HLEN).

In case you are wondering why the skb_reset_mac_header() is not inside
the eth_hdr(), I think that the reason is that, once
skb_reset_mac_header() on the skb was called, there are cases when we
want to make several calls eth_hdr() without each time again resetting
the mac header, for example, in the bridging code.  There is no reason
to doing it in terms of performance.

rgs,
Rami Rosen

http://ramirose.wix.com/ramirosen

On Tue, Nov 20, 2012 at 3:44 PM, Dmitry Kravkov <dmitry@broadcom.com> wrote:
> Hi,
>
> I am trying to use eth_hdr() but it looks like it doesn't point to mac
> header (mac header is present at skb->data). Shouldn't
> skb_reset_mac_header() be called inside eth_header()?
>
> Thanks.
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] xen/netfront: handle compound page fragments on transmit
From: Eric Dumazet @ 2012-11-20 14:45 UTC (permalink / raw)
  To: Ian Campbell
  Cc: netdev, xen-devel, Eric Dumazet, Konrad Rzeszutek Wilk, ANNIE LI,
	Sander Eikelenboom, Stefan Bader
In-Reply-To: <1353411606-15940-1-git-send-email-ian.campbell@citrix.com>

On Tue, 2012-11-20 at 11:40 +0000, Ian Campbell wrote:
> An SKB paged fragment can consist of a compound page with order > 0.
> However the netchannel protocol deals only in PAGE_SIZE frames.
> 
> Handle this in xennet_make_frags by iterating over the frames which
> make up the page.
> 
> This is the netfront equivalent to 6a8ed462f16b for netback.
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Cc: netdev@vger.kernel.org
> Cc: xen-devel@lists.xen.org
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
> Cc: ANNIE LI <annie.li@oracle.com>
> Cc: Sander Eikelenboom <linux@eikelenboom.it>
> Cc: Stefan Bader <stefan.bader@canonical.com>
> ---
>  drivers/net/xen-netfront.c |   58 +++++++++++++++++++++++++++++++++----------
>  1 files changed, 44 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index caa0110..a12b99a 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -452,24 +452,54 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
>  	/* Grant backend access to each skb fragment page. */
>  	for (i = 0; i < frags; i++) {
>  		skb_frag_t *frag = skb_shinfo(skb)->frags + i;
> +		struct page *page = skb_frag_page(frag);
> +		unsigned long size = skb_frag_size(frag);
> +		unsigned long offset = frag->page_offset;
>  
> -		tx->flags |= XEN_NETTXF_more_data;
> +		/* Data must not cross a page boundary. */
> +		BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
>  
> -		id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs);
> -		np->tx_skbs[id].skb = skb_get(skb);
> -		tx = RING_GET_REQUEST(&np->tx, prod++);
> -		tx->id = id;
> -		ref = gnttab_claim_grant_reference(&np->gref_tx_head);
> -		BUG_ON((signed short)ref < 0);
> +		/* Skip unused frames from start of page */

'frame' in the comment means an order-0 page ?

> +		page += offset >> PAGE_SHIFT;
> +		offset &= ~PAGE_MASK;
>  
> -		mfn = pfn_to_mfn(page_to_pfn(skb_frag_page(frag)));
> -		gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id,
> -						mfn, GNTMAP_readonly);
> +		while (size > 0) {
> +			unsigned long bytes;
>  
> -		tx->gref = np->grant_tx_ref[id] = ref;
> -		tx->offset = frag->page_offset;
> -		tx->size = skb_frag_size(frag);
> -		tx->flags = 0;
> +			BUG_ON(offset >= PAGE_SIZE);
> +
> +			bytes = PAGE_SIZE - offset;
> +			if (bytes > size)
> +				bytes = size;
> +
> +			tx->flags |= XEN_NETTXF_more_data;
> +
> +			id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs);
> +			np->tx_skbs[id].skb = skb_get(skb);

BTW this skb_get() means extra atomic operations for every 4096 bytes
unit, and an extra atomic op (and test for final 0) at TX completion.
This could be avoided, by setting np->tx_skbs[id].skb = skb only for the
very last unit.

> +			tx = RING_GET_REQUEST(&np->tx, prod++);
> +			tx->id = id;
> +			ref = gnttab_claim_grant_reference(&np->gref_tx_head);
> +			BUG_ON((signed short)ref < 0);
> +
> +			mfn = pfn_to_mfn(page_to_pfn(page));
> +			gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id,
> +							mfn, GNTMAP_readonly);
> +
> +			tx->gref = np->grant_tx_ref[id] = ref;
> +			tx->offset = offset;
> +			tx->size = bytes;
> +			tx->flags = 0;
> +
> +			offset += bytes;
> +			size -= bytes;
> +
> +			/* Next frame */
> +			if (offset == PAGE_SIZE && size) {
> +				BUG_ON(!PageCompound(page));
> +				page++;
> +				offset = 0;
> +			}
> +		}
>  	}
>  
>  	np->tx.req_prod_pvt = prod;

Acked-by: Eric Dumazet <edumazet@google.com>

Thanks !

^ permalink raw reply

* Re: [PATCH v2] checkpatch: add double empty line check
From: Andy Whitcroft @ 2012-11-20 14:43 UTC (permalink / raw)
  To: Eilon Greenstein; +Cc: Joe Perches, David Rientjes, linux-kernel, netdev
In-Reply-To: <1353421624.6559.9.camel@lb-tlvb-eilong.il.broadcom.com>

On Tue, Nov 20, 2012 at 04:27:04PM +0200, Eilon Greenstein wrote:
> On Tue, 2012-11-20 at 11:52 +0000, Andy Whitcroft wrote:
> 
> Andy, thanks for reviewing this patch.
> 
> > On Sat, Nov 17, 2012 at 01:17:37PM +0200, Eilon Greenstein wrote:
> > > Changes from previous attempt:
> > > - Use CHK instead of WARN
> > > - Issue only one warning per empty lines block
> > > 
> > > Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
> > > ---
> > >  scripts/checkpatch.pl |    8 ++++++++
> > >  1 file changed, 8 insertions(+)
> > > 
> > > diff --git a/scripts/checkpatch.pl b/scripts/linescheckpatch.pl
> > > index 21a9f5d..13d264f 100755
> > > --- a/scripts/checkpatch.pl
> > > +++ b/scripts/checkpatch.pl
> > > @@ -3579,6 +3579,14 @@ sub process {
> > >  			WARN("EXPORTED_WORLD_WRITABLE",
> > >  			     "Exporting world writable files is usually an error. Consider more restrictive permissions.\n" . $herecurr);
> > >  		}
> > > +
> > > +# check for double empty lines
> > > +		if ($line =~ /^\+\s*$/ &&
> > > +		    ($rawlines[$linenr] =~ /^\s*$/ ||
> > > +		     $prevline =~ /^\+?\s*$/ && $rawlines[$linenr] !~ /^\+\s*$/)) {
> > > +			CHK("DOUBLE_EMPTY_LINE",
> > > +			    "One empty line should be sufficient. Consider removing this one.\n" . $herecurr);
> > > +		}
> > >  	}
> > >  
> > >  	# If we have no input at all, then there is nothing to report on
> > 
> > In your previous version you indicated you would be emiting one per group
> > of lines, I do not see how this does that.
> 
> This is what I'm testing:
> Only if the current line is a new blank line and:
> 	if the next line is empty but not newly added (this is the part that
> will make sure we get only one warning for a bunch of new lines - only
> the last newly added line will hit this condition)
> or
> 	if the previous line was empty (either new empty line or existing empty
> line) and the next line is not a new empty line (so we will issue just
> one warning).
> 
> I tested it on few examples, and did not see a problem. Can you share an
> example where it issues more than a single warning for a newly
> introduced consecutive new lines?

No indeed.  That was testing failure on my behalf.
> 
> > Also this fails if the fragment
> > is at the top of the hunk emiting a perl warning.
> 
> I did not see this warning. Can you please share this example? I tried
> adding a couple of empty lines at the beginning of a file and it seemed
> to work just fine for me (using perl v5.14.2).lines

Ok, this is actually if it is at the bottom, not the top.  So if you
have a range of lines newly added to the bottom of the file.  Leading to
this warning:

Use of uninitialized value within @rawlines in pattern match (m//) at
../checkpatch/scripts/checkpatch.pl line 3586.

-apw

^ permalink raw reply

* Re: [PATCH 7/7] netprio_cgroup: allow nesting and inherit config on cgroup creation
From: Tejun Heo @ 2012-11-20 14:40 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: nhorman-2XuSBdqkA4R54TAoqtyWWQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, tgraf-G/eBtMaohhA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <50AB45EA.2050507-kQCPcA+X3s7YtjvyW6yDsg@public.gmane.org>

Hello, Daniel.

On Tue, Nov 20, 2012 at 09:57:14AM +0100, Daniel Wagner wrote:
> >-static void cgrp_css_free(struct cgroup *cgrp)
> >+static int cgrp_css_online(struct cgroup *cgrp)
> >  {
> >-	struct cgroup_netprio_state *cs = cgrp_netprio_state(cgrp);
> >+	struct cgroup *parent = cgrp->parent;
> >  	struct net_device *dev;
> >+	int ret = 0;
> >+
> >+	if (!parent)
> >+		return 0;
> 
> BTW, parent is always != NULL, because the root cgroup will be
> attached to the dummytop cgroup.

Hmmm?  I'm confused.  When ->css_online() is called for dummytop in
cgroup_init_subsys(), its parent is NULL.  What am I missing?

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH 4/7] netprio_cgroup: reimplement priomap expansion
From: Tejun Heo @ 2012-11-20 14:38 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: nhorman-2XuSBdqkA4R54TAoqtyWWQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, tgraf-G/eBtMaohhA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <50AB435E.8060901-kQCPcA+X3s7YtjvyW6yDsg@public.gmane.org>

Hello, Daniel.

On Tue, Nov 20, 2012 at 09:46:22AM +0100, Daniel Wagner wrote:
> struct netprio_map {
> 	struct rcu_head rcu;
> 	struct netprio_aux *aux;	/* auxiliary config array */
> 	u32 priomap_len;
> 	u32 priomap[];
> };
> 
> Is there a specific reason why aux and priomap is handled
> differently? Couldn't you just use same approach for both variables,
> e.g. re/allocating only them here and leave the allocation struct
> netprio_map in cgrp_css_alloc()?

->aux is no longer added, so the consistency issue doesn't exist
anymore.  The reason why they were handled differently before (or
rather why I didn't change priomap[] to be allocated separately) was
that pointer chasing tends to be more expensive than offsetting.  I
don't know how much effect it would have in this case but things
sitting in packet in/out paths can be very hot so didn't wanna disturb
it.

> Also the algorithm to figure out the size of the array might be a
> bit too aggressive in my opinion. So you always start at
> PRIOMAP_MIN_SIZE and then try to double the size until target_idx
> fits. Wouldn't it make sense to start to look for the new size
> beginning at old->priomap_len and then do the power-of-two increase?

The only downside of always starting from PRIOMAP_MIN_SIZE is
iterating several more times in the sizing loop which isn't really
anything to worry about.  The loop is structured that way because I
wanted to keep the size of the whole thing power-of-two.  Due to the
fields before priomap[], if we size priomap_len power-of-two, we'll
always end up with something slightly over power-of-two, which is
usually the worst size to allocate.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [Xen-devel] [PATCH] xen/netfront: handle compound page fragments on transmit
From: Jan Beulich @ 2012-11-20 14:32 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Stefan Bader, Sander Eikelenboom, Eric Dumazet,
	KonradRzeszutekWilk, xen-devel@lists.xen.org, ANNIE LI,
	netdev@vger.kernel.org
In-Reply-To: <1353420865.13542.44.camel@zakaz.uk.xensource.com>

>>> On 20.11.12 at 15:14, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Tue, 2012-11-20 at 13:51 +0000, Jan Beulich wrote:
>> >>> On 20.11.12 at 14:35, Ian Campbell <Ian.Campbell@citrix.com> wrote:
>> > On Tue, 2012-11-20 at 12:28 +0000, Jan Beulich wrote:
>> >> >>> On 20.11.12 at 12:40, Ian Campbell <ian.campbell@citrix.com> wrote:
>> >> > An SKB paged fragment can consist of a compound page with order > 0.
>> >> > However the netchannel protocol deals only in PAGE_SIZE frames.
>> >> > 
>> >> > Handle this in xennet_make_frags by iterating over the frames which
>> >> > make up the page.
>> >> > 
>> >> > This is the netfront equivalent to 6a8ed462f16b for netback.
>> >> 
>> >> Wouldn't you need to be at least a little more conservative here
>> >> with respect to resource use: I realize that get_id_from_freelist()
>> >> return values were never checked, and failure of
>> >> gnttab_claim_grant_reference() was always dealt with via
>> >> BUG_ON(), but considering that netfront_tx_slot_available()
>> >> doesn't account for compound page fragments, I think this (lack
>> >> of) error handling needs improvement in the course of the
>> >> change here (regardless of - I think - someone having said that
>> >> usually the sum of all pages referenced from an skb's fragments
>> >> would not exceed MAX_SKB_FRAGS - "usually" just isn't enough
>> >> imo).
>> > 
>> > I think it is more than "usually", it is derived from the number of
>> > pages needed to contain 64K of data which is the maximum size of the
>> > data associated with an skb (AIUI).
>> > 
>> > Unwinding from failure in xennet_make_frags looks pretty tricky,
>> 
>> Yes, I agree.
>> 
>> > but how about this incremental patch:
>> 
>> Looks good, but can probably be simplified quite a bit:
>> 
>> > --- a/drivers/net/xen-netfront.c
>> > +++ b/drivers/net/xen-netfront.c
>> > @@ -505,6 +505,46 @@ static void xennet_make_frags(struct sk_buff *skb, 
> struct net_device *dev,
>> >  	np->tx.req_prod_pvt = prod;
>> >  }
>> >  
>> > +/*
>> > + * Count how many ring slots are required to send the frags of this
>> > + * skb. Each frag might be a compound page.
>> > + */
>> > +static int xennet_count_skb_frag_pages(struct sk_buff *skb)
>> > +{
>> > +	int i, frags = skb_shinfo(skb)->nr_frags;
>> > +	int pages = 0;
>> > +
>> > +	for (i = 0; i < frags; i++) {
>> > +		skb_frag_t *frag = skb_shinfo(skb)->frags + i;
>> > +		unsigned long size = skb_frag_size(frag);
>> > +		unsigned long offset = frag->page_offset;
>> > +
>> > +		/* Skip unused frames from start of page */
>> > +		offset &= ~PAGE_MASK;
>> > +
>> > +		while (size > 0) {
>> > +			unsigned long bytes;
>> > +
>> > +			BUG_ON(offset >= PAGE_SIZE);
>> > +
>> > +			bytes = PAGE_SIZE - offset;
>> > +			if (bytes > size)
>> > +				bytes = size;
>> > +
>> > +			offset += bytes;
>> > +			size -= bytes;
>> > +
>> > +			/* Next frame */
>> > +			if (offset == PAGE_SIZE && size) {
>> > +				pages++;
>> > +				offset = 0;
>> > +			}
>> > +		}
>> 
>> Isn't the whole loop equivalent to 
>> 
>> 		pages = PFN_UP(offset + size);
>> 
>> (at least as long as size is not zero)?
> 
> Er, yes. Wood for the trees etc...
> 
> I think using PFN_UP overcounts a bit since the data needed start in the
> first frame of a compound frame, but if you keep the 
>         /* Skip unused frames from start of page */
>         offset &= ~PAGE_MASK;
>         
> I think that does the right thing

Right, that's what I said (I only wanted the loop to be replaced, not
what was prior to it).

> @@ -517,15 +540,16 @@ static int xennet_start_xmit(struct sk_buff *skb, 
> struct net_device *dev)
>  	grant_ref_t ref;
>  	unsigned long mfn;
>  	int notify;
> -	int frags = skb_shinfo(skb)->nr_frags;
> +	int slots;
>  	unsigned int offset = offset_in_page(data);
>  	unsigned int len = skb_headlen(skb);
>  	unsigned long flags;
>  
> -	frags += DIV_ROUND_UP(offset + len, PAGE_SIZE);
> -	if (unlikely(frags > MAX_SKB_FRAGS + 1)) {
> -		printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n",
> -		       frags);
> +	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
> +		xennet_count_skb_frag_slots(skb);
> +	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {

But still - isn't this wrong now (i.e. can't it now validly exceed the
boundary checked for)?

Jan

> +		printk(KERN_ALERT "xennet: skb rides the rocket: %d slots\n",
> +		       slots);
>  		dump_stack();
>  		goto drop;
>  	}

^ permalink raw reply

* Re: [PATCH v2] checkpatch: add double empty line check
From: Eilon Greenstein @ 2012-11-20 14:27 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: Joe Perches, David Rientjes, linux-kernel, netdev
In-Reply-To: <20121120115239.GA7955@dm>

On Tue, 2012-11-20 at 11:52 +0000, Andy Whitcroft wrote:

Andy, thanks for reviewing this patch.

> On Sat, Nov 17, 2012 at 01:17:37PM +0200, Eilon Greenstein wrote:
> > Changes from previous attempt:
> > - Use CHK instead of WARN
> > - Issue only one warning per empty lines block
> > 
> > Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
> > ---
> >  scripts/checkpatch.pl |    8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/scripts/checkpatch.pl b/scripts/linescheckpatch.pl
> > index 21a9f5d..13d264f 100755
> > --- a/scripts/checkpatch.pl
> > +++ b/scripts/checkpatch.pl
> > @@ -3579,6 +3579,14 @@ sub process {
> >  			WARN("EXPORTED_WORLD_WRITABLE",
> >  			     "Exporting world writable files is usually an error. Consider more restrictive permissions.\n" . $herecurr);
> >  		}
> > +
> > +# check for double empty lines
> > +		if ($line =~ /^\+\s*$/ &&
> > +		    ($rawlines[$linenr] =~ /^\s*$/ ||
> > +		     $prevline =~ /^\+?\s*$/ && $rawlines[$linenr] !~ /^\+\s*$/)) {
> > +			CHK("DOUBLE_EMPTY_LINE",
> > +			    "One empty line should be sufficient. Consider removing this one.\n" . $herecurr);
> > +		}
> >  	}
> >  
> >  	# If we have no input at all, then there is nothing to report on
> 
> In your previous version you indicated you would be emiting one per group
> of lines, I do not see how this does that.

This is what I'm testing:
Only if the current line is a new blank line and:
	if the next line is empty but not newly added (this is the part that
will make sure we get only one warning for a bunch of new lines - only
the last newly added line will hit this condition)
or
	if the previous line was empty (either new empty line or existing empty
line) and the next line is not a new empty line (so we will issue just
one warning).

I tested it on few examples, and did not see a problem. Can you share an
example where it issues more than a single warning for a newly
introduced consecutive new lines?

> Also this fails if the fragment
> is at the top of the hunk emiting a perl warning.

I did not see this warning. Can you please share this example? I tried
adding a couple of empty lines at the beginning of a file and it seemed
to work just fine for me (using perl v5.14.2).lines

> We should probabally
> use the suppress approach.
> 
> How about something like the below.
> 
> -apw
> 
> 
> From 848ebffa8656a1ff96a91788ec0f1c04dab9c3e9 Mon Sep 17 00:00:00 2001
> From: Andy Whitcroft <apw@canonical.com>
> Date: Sat, 17 Nov 2012 13:17:37 +0200
> Subject: [PATCH] checkpatch: strict warning for multiple blank lines
> 
> Signed-off-by: Andy Whitcroft <apw@canonical.com>
> ---
>  scripts/checkpatch.pl |   11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index f18750e..dbc68f3 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -1411,6 +1411,7 @@ sub process {
>  	my %suppress_whiletrailers;
>  	my %suppress_export;
>  	my $suppress_statement = 0;
> +	my $suppress_multipleblank = -1;
>  
>  	# Pre-scan the patch sanitizing the lines.
>  	# Pre-scan the patch looking for any __setup documentation.
> @@ -1521,6 +1522,7 @@ sub process {
>  			%suppress_whiletrailers = ();
>  			%suppress_export = ();
>  			$suppress_statement = 0;
> +			$suppress_multipleblank = -1;
>  			next;
>  
>  # track the line number as we move through the hunk, note that
> @@ -1930,6 +1932,15 @@ sub process {
>  			      "use the SSYNC() macro in asm/blackfin.h\n" . $herevet);
>  		}
>  
> +# multiple blank lines.
> +		if ($line =~ /^-/ || ($suppress_multipleblank == $linenr && $line =~ /^[ \+]\s*$/)) {
> +			$suppress_multipleblank++;
> +		} elsif ($prevline =~ /^\+\s*$/ and $line =~ /^\+\s*$/) {
> +			$suppress_multipleblank = $linenr + 1;
> +			CHK("MULTIPLE_EMPTY_LINE",
> +			    "One empty line should be sufficient. Consider removing this one.\n" . $herecurr);
> +		}
> +
>  # Check for potential 'bare' types
>  		my ($stat, $cond, $line_nr_next, $remain_next, $off_next,
>  		    $realline_next);

The problem with this version is that it only catches newly added empty
lines. But if there was an empty line before or after this newly added
empty line, there will be no warning. I would like to catch things like
that as well.

Thanks,
Eilon

^ permalink raw reply

* Re: [Xen-devel] [PATCH] xen/netfront: handle compound page fragments on transmit
From: Ian Campbell @ 2012-11-20 14:14 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefan Bader, Sander Eikelenboom, Eric Dumazet,
	KonradRzeszutek Wilk, xen-devel@lists.xen.org, ANNIE LI,
	netdev@vger.kernel.org
In-Reply-To: <50AB990602000078000A9F5B@nat28.tlf.novell.com>

On Tue, 2012-11-20 at 13:51 +0000, Jan Beulich wrote:
> >>> On 20.11.12 at 14:35, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> > On Tue, 2012-11-20 at 12:28 +0000, Jan Beulich wrote:
> >> >>> On 20.11.12 at 12:40, Ian Campbell <ian.campbell@citrix.com> wrote:
> >> > An SKB paged fragment can consist of a compound page with order > 0.
> >> > However the netchannel protocol deals only in PAGE_SIZE frames.
> >> > 
> >> > Handle this in xennet_make_frags by iterating over the frames which
> >> > make up the page.
> >> > 
> >> > This is the netfront equivalent to 6a8ed462f16b for netback.
> >> 
> >> Wouldn't you need to be at least a little more conservative here
> >> with respect to resource use: I realize that get_id_from_freelist()
> >> return values were never checked, and failure of
> >> gnttab_claim_grant_reference() was always dealt with via
> >> BUG_ON(), but considering that netfront_tx_slot_available()
> >> doesn't account for compound page fragments, I think this (lack
> >> of) error handling needs improvement in the course of the
> >> change here (regardless of - I think - someone having said that
> >> usually the sum of all pages referenced from an skb's fragments
> >> would not exceed MAX_SKB_FRAGS - "usually" just isn't enough
> >> imo).
> > 
> > I think it is more than "usually", it is derived from the number of
> > pages needed to contain 64K of data which is the maximum size of the
> > data associated with an skb (AIUI).
> > 
> > Unwinding from failure in xennet_make_frags looks pretty tricky,
> 
> Yes, I agree.
> 
> > but how about this incremental patch:
> 
> Looks good, but can probably be simplified quite a bit:
> 
> > --- a/drivers/net/xen-netfront.c
> > +++ b/drivers/net/xen-netfront.c
> > @@ -505,6 +505,46 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
> >  	np->tx.req_prod_pvt = prod;
> >  }
> >  
> > +/*
> > + * Count how many ring slots are required to send the frags of this
> > + * skb. Each frag might be a compound page.
> > + */
> > +static int xennet_count_skb_frag_pages(struct sk_buff *skb)
> > +{
> > +	int i, frags = skb_shinfo(skb)->nr_frags;
> > +	int pages = 0;
> > +
> > +	for (i = 0; i < frags; i++) {
> > +		skb_frag_t *frag = skb_shinfo(skb)->frags + i;
> > +		unsigned long size = skb_frag_size(frag);
> > +		unsigned long offset = frag->page_offset;
> > +
> > +		/* Skip unused frames from start of page */
> > +		offset &= ~PAGE_MASK;
> > +
> > +		while (size > 0) {
> > +			unsigned long bytes;
> > +
> > +			BUG_ON(offset >= PAGE_SIZE);
> > +
> > +			bytes = PAGE_SIZE - offset;
> > +			if (bytes > size)
> > +				bytes = size;
> > +
> > +			offset += bytes;
> > +			size -= bytes;
> > +
> > +			/* Next frame */
> > +			if (offset == PAGE_SIZE && size) {
> > +				pages++;
> > +				offset = 0;
> > +			}
> > +		}
> 
> Isn't the whole loop equivalent to 
> 
> 		pages = PFN_UP(offset + size);
> 
> (at least as long as size is not zero)?

Er, yes. Wood for the trees etc...

I think using PFN_UP overcounts a bit since the data needed start in the
first frame of a compound frame, but if you keep the 
        /* Skip unused frames from start of page */
        offset &= ~PAGE_MASK;
        
I think that does the right thing

> > @@ -517,12 +557,13 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> >  	grant_ref_t ref;
> >  	unsigned long mfn;
> >  	int notify;
> > -	int frags = skb_shinfo(skb)->nr_frags;
> > +	int frags;
> >  	unsigned int offset = offset_in_page(data);
> >  	unsigned int len = skb_headlen(skb);
> >  	unsigned long flags;
> >  
> > -	frags += DIV_ROUND_UP(offset + len, PAGE_SIZE);
> > +	frags = xennet_count_skb_frag_pages(skb) +
> > +		DIV_ROUND_UP(offset + len, PAGE_SIZE);
> >  	if (unlikely(frags > MAX_SKB_FRAGS + 1)) {
> 
> This condition would now need adjustment, though (because
> "frags" is no longer what its name says).

I think it already wasn't what the name says, since it included the skb
head too. Perhaps "slots" would be a better name?

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index a12b99a..b744875 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -505,6 +505,29 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
 	np->tx.req_prod_pvt = prod;
 }
 
+/*
+ * Count how many ring slots are required to send the frags of this
+ * skb. Each frag might be a compound page.
+ */
+static int xennet_count_skb_frag_slots(struct sk_buff *skb)
+{
+	int i, frags = skb_shinfo(skb)->nr_frags;
+	int pages = 0;
+
+	for (i = 0; i < frags; i++) {
+		skb_frag_t *frag = skb_shinfo(skb)->frags + i;
+		unsigned long size = skb_frag_size(frag);
+		unsigned long offset = frag->page_offset;
+
+		/* Skip unused frames from start of page */
+		offset &= ~PAGE_MASK;
+
+		pages += PFN_UP(offset + size);
+	}
+
+	return pages;
+}
+
 static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	unsigned short id;
@@ -517,15 +540,16 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	grant_ref_t ref;
 	unsigned long mfn;
 	int notify;
-	int frags = skb_shinfo(skb)->nr_frags;
+	int slots;
 	unsigned int offset = offset_in_page(data);
 	unsigned int len = skb_headlen(skb);
 	unsigned long flags;
 
-	frags += DIV_ROUND_UP(offset + len, PAGE_SIZE);
-	if (unlikely(frags > MAX_SKB_FRAGS + 1)) {
-		printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n",
-		       frags);
+	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
+		xennet_count_skb_frag_slots(skb);
+	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
+		printk(KERN_ALERT "xennet: skb rides the rocket: %d slots\n",
+		       slots);
 		dump_stack();
 		goto drop;
 	}
@@ -533,7 +557,7 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	spin_lock_irqsave(&np->tx_lock, flags);
 
 	if (unlikely(!netif_carrier_ok(dev) ||
-		     (frags > 1 && !xennet_can_sg(dev)) ||
+		     (slots > 1 && !xennet_can_sg(dev)) ||
 		     netif_needs_gso(skb, netif_skb_features(skb)))) {
 		spin_unlock_irqrestore(&np->tx_lock, flags);
 		goto drop;

^ permalink raw reply related

* Re: [Xen-devel] [PATCH] xen/netfront: handle compound page fragments on transmit
From: Jan Beulich @ 2012-11-20 13:51 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Stefan Bader, Sander Eikelenboom, Eric Dumazet,
	KonradRzeszutek Wilk, xen-devel@lists.xen.org, ANNIE LI,
	netdev@vger.kernel.org
In-Reply-To: <1353418516.13542.38.camel@zakaz.uk.xensource.com>

>>> On 20.11.12 at 14:35, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Tue, 2012-11-20 at 12:28 +0000, Jan Beulich wrote:
>> >>> On 20.11.12 at 12:40, Ian Campbell <ian.campbell@citrix.com> wrote:
>> > An SKB paged fragment can consist of a compound page with order > 0.
>> > However the netchannel protocol deals only in PAGE_SIZE frames.
>> > 
>> > Handle this in xennet_make_frags by iterating over the frames which
>> > make up the page.
>> > 
>> > This is the netfront equivalent to 6a8ed462f16b for netback.
>> 
>> Wouldn't you need to be at least a little more conservative here
>> with respect to resource use: I realize that get_id_from_freelist()
>> return values were never checked, and failure of
>> gnttab_claim_grant_reference() was always dealt with via
>> BUG_ON(), but considering that netfront_tx_slot_available()
>> doesn't account for compound page fragments, I think this (lack
>> of) error handling needs improvement in the course of the
>> change here (regardless of - I think - someone having said that
>> usually the sum of all pages referenced from an skb's fragments
>> would not exceed MAX_SKB_FRAGS - "usually" just isn't enough
>> imo).
> 
> I think it is more than "usually", it is derived from the number of
> pages needed to contain 64K of data which is the maximum size of the
> data associated with an skb (AIUI).
> 
> Unwinding from failure in xennet_make_frags looks pretty tricky,

Yes, I agree.

> but how about this incremental patch:

Looks good, but can probably be simplified quite a bit:

> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -505,6 +505,46 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
>  	np->tx.req_prod_pvt = prod;
>  }
>  
> +/*
> + * Count how many ring slots are required to send the frags of this
> + * skb. Each frag might be a compound page.
> + */
> +static int xennet_count_skb_frag_pages(struct sk_buff *skb)
> +{
> +	int i, frags = skb_shinfo(skb)->nr_frags;
> +	int pages = 0;
> +
> +	for (i = 0; i < frags; i++) {
> +		skb_frag_t *frag = skb_shinfo(skb)->frags + i;
> +		unsigned long size = skb_frag_size(frag);
> +		unsigned long offset = frag->page_offset;
> +
> +		/* Skip unused frames from start of page */
> +		offset &= ~PAGE_MASK;
> +
> +		while (size > 0) {
> +			unsigned long bytes;
> +
> +			BUG_ON(offset >= PAGE_SIZE);
> +
> +			bytes = PAGE_SIZE - offset;
> +			if (bytes > size)
> +				bytes = size;
> +
> +			offset += bytes;
> +			size -= bytes;
> +
> +			/* Next frame */
> +			if (offset == PAGE_SIZE && size) {
> +				pages++;
> +				offset = 0;
> +			}
> +		}

Isn't the whole loop equivalent to 

		pages = PFN_UP(offset + size);

(at least as long as size is not zero)?

Plus I think the increment of pages would need to be pulled out
of the if() body.

> +	}
> +
> +	return pages;
> +}
> +
>  static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  {
>  	unsigned short id;
> @@ -517,12 +557,13 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	grant_ref_t ref;
>  	unsigned long mfn;
>  	int notify;
> -	int frags = skb_shinfo(skb)->nr_frags;
> +	int frags;
>  	unsigned int offset = offset_in_page(data);
>  	unsigned int len = skb_headlen(skb);
>  	unsigned long flags;
>  
> -	frags += DIV_ROUND_UP(offset + len, PAGE_SIZE);
> +	frags = xennet_count_skb_frag_pages(skb) +
> +		DIV_ROUND_UP(offset + len, PAGE_SIZE);
>  	if (unlikely(frags > MAX_SKB_FRAGS + 1)) {

This condition would now need adjustment, though (because
"frags" is no longer what its name says).

Jan

>  		printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n",
>  		       frags);

^ permalink raw reply

* question about eth_header
From: Dmitry Kravkov @ 2012-11-20 13:44 UTC (permalink / raw)
  To: netdev

Hi,

I am trying to use eth_hdr() but it looks like it doesn't point to mac
header (mac header is present at skb->data). Shouldn't
skb_reset_mac_header() be called inside eth_header()?

Thanks.

^ permalink raw reply

* Re: [PATCH] xen/netfront: handle compound page fragments on transmit
From: Sander Eikelenboom @ 2012-11-20 13:45 UTC (permalink / raw)
  To: Stefan Bader
  Cc: Ian Campbell, netdev, xen-devel, Eric Dumazet,
	Konrad Rzeszutek Wilk, ANNIE LI
In-Reply-To: <50AB860E.3090106@canonical.com>


Tuesday, November 20, 2012, 2:30:54 PM, you wrote:

> Aside from Jans comments about error handling, I tried below patch and it seems
> to solve the problem with transfers out of the domU for me (though only shallow
> testing done, otoh 5 times is more than getting stuck the first time).

> -Stefan

I'm running with this patch now, it seems to fix the problems for me as well.

--
Sander

> On 20.11.2012 12:40, Ian Campbell wrote:
>> An SKB paged fragment can consist of a compound page with order > 0.
>> However the netchannel protocol deals only in PAGE_SIZE frames.
>> 
>> Handle this in xennet_make_frags by iterating over the frames which
>> make up the page.
>> 
>> This is the netfront equivalent to 6a8ed462f16b for netback.
>> 
>> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
>> Cc: netdev@vger.kernel.org
>> Cc: xen-devel@lists.xen.org
>> Cc: Eric Dumazet <edumazet@google.com>
>> Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
>> Cc: ANNIE LI <annie.li@oracle.com>
>> Cc: Sander Eikelenboom <linux@eikelenboom.it>
>> Cc: Stefan Bader <stefan.bader@canonical.com>
> Tested-by: Stefan Bader <stefan.bader@canonical.com>
>> ---
>>  drivers/net/xen-netfront.c |   58 +++++++++++++++++++++++++++++++++----------
>>  1 files changed, 44 insertions(+), 14 deletions(-)
>> 
>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>> index caa0110..a12b99a 100644
>> --- a/drivers/net/xen-netfront.c
>> +++ b/drivers/net/xen-netfront.c
>> @@ -452,24 +452,54 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
>>       /* Grant backend access to each skb fragment page. */
>>       for (i = 0; i < frags; i++) {
>>               skb_frag_t *frag = skb_shinfo(skb)->frags + i;
>> +             struct page *page = skb_frag_page(frag);
>> +             unsigned long size = skb_frag_size(frag);
>> +             unsigned long offset = frag->page_offset;
>>  
>> -             tx->flags |= XEN_NETTXF_more_data;
>> +             /* Data must not cross a page boundary. */
>> +             BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
>>  
>> -             id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs);
>> -             np->tx_skbs[id].skb = skb_get(skb);
>> -             tx = RING_GET_REQUEST(&np->tx, prod++);
>> -             tx->id = id;
>> -             ref = gnttab_claim_grant_reference(&np->gref_tx_head);
>> -             BUG_ON((signed short)ref < 0);
>> +             /* Skip unused frames from start of page */
>> +             page += offset >> PAGE_SHIFT;
>> +             offset &= ~PAGE_MASK;
>>  
>> -             mfn = pfn_to_mfn(page_to_pfn(skb_frag_page(frag)));
>> -             gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id,
>> -                                             mfn, GNTMAP_readonly);
>> +             while (size > 0) {
>> +                     unsigned long bytes;
>>  
>> -             tx->gref = np->grant_tx_ref[id] = ref;
>> -             tx->offset = frag->page_offset;
>> -             tx->size = skb_frag_size(frag);
>> -             tx->flags = 0;
>> +                     BUG_ON(offset >= PAGE_SIZE);
>> +
>> +                     bytes = PAGE_SIZE - offset;
>> +                     if (bytes > size)
>> +                             bytes = size;
>> +
>> +                     tx->flags |= XEN_NETTXF_more_data;
>> +
>> +                     id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs);
>> +                     np->tx_skbs[id].skb = skb_get(skb);
>> +                     tx = RING_GET_REQUEST(&np->tx, prod++);
>> +                     tx->id = id;
>> +                     ref = gnttab_claim_grant_reference(&np->gref_tx_head);
>> +                     BUG_ON((signed short)ref < 0);
>> +
>> +                     mfn = pfn_to_mfn(page_to_pfn(page));
>> +                     gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id,
>> +                                                     mfn, GNTMAP_readonly);
>> +
>> +                     tx->gref = np->grant_tx_ref[id] = ref;
>> +                     tx->offset = offset;
>> +                     tx->size = bytes;
>> +                     tx->flags = 0;
>> +
>> +                     offset += bytes;
>> +                     size -= bytes;
>> +
>> +                     /* Next frame */
>> +                     if (offset == PAGE_SIZE && size) {
>> +                             BUG_ON(!PageCompound(page));
>> +                             page++;
>> +                             offset = 0;
>> +                     }
>> +             }
>>       }
>>  
>>       np->tx.req_prod_pvt = prod;
>> 

^ permalink raw reply

* Re: [Xen-devel] [PATCH] xen/netfront: handle compound page fragments on transmit
From: Ian Campbell @ 2012-11-20 13:35 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefan Bader, Sander Eikelenboom, Eric Dumazet,
	Konrad Rzeszutek Wilk, xen-devel@lists.xen.org, ANNIE LI,
	netdev@vger.kernel.org
In-Reply-To: <50AB856D02000078000A9EFD@nat28.tlf.novell.com>

On Tue, 2012-11-20 at 12:28 +0000, Jan Beulich wrote:
> >>> On 20.11.12 at 12:40, Ian Campbell <ian.campbell@citrix.com> wrote:
> > An SKB paged fragment can consist of a compound page with order > 0.
> > However the netchannel protocol deals only in PAGE_SIZE frames.
> > 
> > Handle this in xennet_make_frags by iterating over the frames which
> > make up the page.
> > 
> > This is the netfront equivalent to 6a8ed462f16b for netback.
> 
> Wouldn't you need to be at least a little more conservative here
> with respect to resource use: I realize that get_id_from_freelist()
> return values were never checked, and failure of
> gnttab_claim_grant_reference() was always dealt with via
> BUG_ON(), but considering that netfront_tx_slot_available()
> doesn't account for compound page fragments, I think this (lack
> of) error handling needs improvement in the course of the
> change here (regardless of - I think - someone having said that
> usually the sum of all pages referenced from an skb's fragments
> would not exceed MAX_SKB_FRAGS - "usually" just isn't enough
> imo).

I think it is more than "usually", it is derived from the number of
pages needed to contain 64K of data which is the maximum size of the
data associated with an skb (AIUI).

Unwinding from failure in xennet_make_frags looks pretty tricky, but how
about this incremental patch:

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index a12b99a..06d0a84 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -505,6 +505,46 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
 	np->tx.req_prod_pvt = prod;
 }
 
+/*
+ * Count how many ring slots are required to send the frags of this
+ * skb. Each frag might be a compound page.
+ */
+static int xennet_count_skb_frag_pages(struct sk_buff *skb)
+{
+	int i, frags = skb_shinfo(skb)->nr_frags;
+	int pages = 0;
+
+	for (i = 0; i < frags; i++) {
+		skb_frag_t *frag = skb_shinfo(skb)->frags + i;
+		unsigned long size = skb_frag_size(frag);
+		unsigned long offset = frag->page_offset;
+
+		/* Skip unused frames from start of page */
+		offset &= ~PAGE_MASK;
+
+		while (size > 0) {
+			unsigned long bytes;
+
+			BUG_ON(offset >= PAGE_SIZE);
+
+			bytes = PAGE_SIZE - offset;
+			if (bytes > size)
+				bytes = size;
+
+			offset += bytes;
+			size -= bytes;
+
+			/* Next frame */
+			if (offset == PAGE_SIZE && size) {
+				pages++;
+				offset = 0;
+			}
+		}
+	}
+
+	return pages;
+}
+
 static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	unsigned short id;
@@ -517,12 +557,13 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	grant_ref_t ref;
 	unsigned long mfn;
 	int notify;
-	int frags = skb_shinfo(skb)->nr_frags;
+	int frags;
 	unsigned int offset = offset_in_page(data);
 	unsigned int len = skb_headlen(skb);
 	unsigned long flags;
 
-	frags += DIV_ROUND_UP(offset + len, PAGE_SIZE);
+	frags = xennet_count_skb_frag_pages(skb) +
+		DIV_ROUND_UP(offset + len, PAGE_SIZE);
 	if (unlikely(frags > MAX_SKB_FRAGS + 1)) {
 		printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n",
 		       frags);

^ permalink raw reply related

* Re: [PATCH] xen/netfront: handle compound page fragments on transmit
From: Stefan Bader @ 2012-11-20 13:30 UTC (permalink / raw)
  To: Ian Campbell
  Cc: netdev, xen-devel, Eric Dumazet, Konrad Rzeszutek Wilk, ANNIE LI,
	Sander Eikelenboom
In-Reply-To: <1353411606-15940-1-git-send-email-ian.campbell@citrix.com>

[-- Attachment #1: Type: text/plain, Size: 3619 bytes --]

Aside from Jans comments about error handling, I tried below patch and it seems
to solve the problem with transfers out of the domU for me (though only shallow
testing done, otoh 5 times is more than getting stuck the first time).

-Stefan

On 20.11.2012 12:40, Ian Campbell wrote:
> An SKB paged fragment can consist of a compound page with order > 0.
> However the netchannel protocol deals only in PAGE_SIZE frames.
> 
> Handle this in xennet_make_frags by iterating over the frames which
> make up the page.
> 
> This is the netfront equivalent to 6a8ed462f16b for netback.
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Cc: netdev@vger.kernel.org
> Cc: xen-devel@lists.xen.org
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
> Cc: ANNIE LI <annie.li@oracle.com>
> Cc: Sander Eikelenboom <linux@eikelenboom.it>
> Cc: Stefan Bader <stefan.bader@canonical.com>
Tested-by: Stefan Bader <stefan.bader@canonical.com>
> ---
>  drivers/net/xen-netfront.c |   58 +++++++++++++++++++++++++++++++++----------
>  1 files changed, 44 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index caa0110..a12b99a 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -452,24 +452,54 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
>  	/* Grant backend access to each skb fragment page. */
>  	for (i = 0; i < frags; i++) {
>  		skb_frag_t *frag = skb_shinfo(skb)->frags + i;
> +		struct page *page = skb_frag_page(frag);
> +		unsigned long size = skb_frag_size(frag);
> +		unsigned long offset = frag->page_offset;
>  
> -		tx->flags |= XEN_NETTXF_more_data;
> +		/* Data must not cross a page boundary. */
> +		BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
>  
> -		id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs);
> -		np->tx_skbs[id].skb = skb_get(skb);
> -		tx = RING_GET_REQUEST(&np->tx, prod++);
> -		tx->id = id;
> -		ref = gnttab_claim_grant_reference(&np->gref_tx_head);
> -		BUG_ON((signed short)ref < 0);
> +		/* Skip unused frames from start of page */
> +		page += offset >> PAGE_SHIFT;
> +		offset &= ~PAGE_MASK;
>  
> -		mfn = pfn_to_mfn(page_to_pfn(skb_frag_page(frag)));
> -		gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id,
> -						mfn, GNTMAP_readonly);
> +		while (size > 0) {
> +			unsigned long bytes;
>  
> -		tx->gref = np->grant_tx_ref[id] = ref;
> -		tx->offset = frag->page_offset;
> -		tx->size = skb_frag_size(frag);
> -		tx->flags = 0;
> +			BUG_ON(offset >= PAGE_SIZE);
> +
> +			bytes = PAGE_SIZE - offset;
> +			if (bytes > size)
> +				bytes = size;
> +
> +			tx->flags |= XEN_NETTXF_more_data;
> +
> +			id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs);
> +			np->tx_skbs[id].skb = skb_get(skb);
> +			tx = RING_GET_REQUEST(&np->tx, prod++);
> +			tx->id = id;
> +			ref = gnttab_claim_grant_reference(&np->gref_tx_head);
> +			BUG_ON((signed short)ref < 0);
> +
> +			mfn = pfn_to_mfn(page_to_pfn(page));
> +			gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id,
> +							mfn, GNTMAP_readonly);
> +
> +			tx->gref = np->grant_tx_ref[id] = ref;
> +			tx->offset = offset;
> +			tx->size = bytes;
> +			tx->flags = 0;
> +
> +			offset += bytes;
> +			size -= bytes;
> +
> +			/* Next frame */
> +			if (offset == PAGE_SIZE && size) {
> +				BUG_ON(!PageCompound(page));
> +				page++;
> +				offset = 0;
> +			}
> +		}
>  	}
>  
>  	np->tx.req_prod_pvt = prod;
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 897 bytes --]

^ permalink raw reply

* Re: 82571EB: Detected Hardware Unit Hang
From: Joe Jin @ 2012-11-20 13:24 UTC (permalink / raw)
  To: Dave, Tushar N
  Cc: e1000-devel@lists.sf.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Mary Mcgrath
In-Reply-To: <061C8A8601E8EE4CA8D8FD6990CEA8913349EB41@ORSMSX102.amr.corp.intel.com>

On 11/20/12 16:59, Dave, Tushar N wrote:
> Have you power off the system completely after modifying eeprom? If not please do so.

Hi Tushar,

Seems not works for me, would you please help to check what is wrong of my operations?

Original eeprom dump:

# ethtool -e eth3 | head -8
Offset		Values
------		------
0x0000		00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 
0x0010		57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 
0x0020		08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 
0x0030		f6 6c b0 37 a6 07 03 84 83 07 00 00 03 c3 02 06 
                            ^^^^^
0x0040		08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 
0x0050		00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

# lspci -s 0000:52:00.1 -vvv
52:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
<--snip-->
	Capabilities: [e0] Express (v1) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 4096 bytes
                        ^^^^^^^^^^^^^^^^^^^^^
<--snip-->

# ethtool eth3
Settings for eth3:
	Supported ports: [ TP ]
	Supported link modes:   10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Full 
	Supports auto-negotiation: Yes
	Advertised link modes:  10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Full 
	Advertised pause frame use: No
	Advertised auto-negotiation: Yes
	Speed: 1000Mb/s
	Duplex: Full
	Port: Twisted Pair
	PHYAD: 1
	Transceiver: internal
	Auto-negotiation: on
	MDI-X: off
	Supports Wake-on: d
	Wake-on: d
	Current message level: 0x00000007 (7)
	Link detected: yes

# ethtool -E eth3 magic 0x10a48086 offset 0x34 value 0xa7
# ethtool -e eth3 | head -8
Offset		Values
------		------
0x0000		00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 
0x0010		57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 
0x0020		08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 
0x0030		f6 6c b0 37 a7 07 03 84 83 07 00 00 03 c3 02 06 
                            ^^^^^ <== a6 --> a7
0x0040		08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 
0x0050		00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

# reboot

# ethtool -e eth3 | head -8
Offset		Values
------		------
0x0000		00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 
0x0010		57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 
0x0020		08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 
0x0030		f6 6c b0 37 a7 07 03 84 83 07 00 00 03 c3 02 06 
0x0040		08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 
0x0050		00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

# lspci -s 0000:52:00.1 -vvv
52:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
<--snip-->
	Capabilities: [e0] Express (v1) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 4096 bytes
                        ^^^^^^^^^^^^^^^^^^^^^
		DevSta:	CorrErr- UncorrErr- FatalErr+ UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x4, ASPM L0s, Latency L0 <4us, L1 <64us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
<--snip-->

#  ethtool -E eth3 magic 0x10a48086 offset 0x35 value 0x17

# ethtool -e eth3 | head -8
Offset		Values
------		------
0x0000		00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 
0x0010		57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 
0x0020		08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 
0x0030		f6 6c b0 37 a6 17 03 84 83 07 00 00 03 c3 02 06 
                            ^^^^^<== 07 -> 17
0x0040		08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 
0x0050		00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

# reboot

# ethtool -e eth3 | head -8
Offset		Values
------		------
0x0000		00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 
0x0010		57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 
0x0020		08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 
0x0030		f6 6c b0 37 a6 17 03 84 83 07 00 00 03 c3 02 06 
                            ^^^^^<== 07 -> 17
0x0040		08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 
0x0050		00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

# lspci -s 0000:52:00.1 -vvv
52:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
<--snip-->
	Capabilities: [e0] Express (v1) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 4096 bytes
                        ^^^^^^^^^^^^^^^^^^^^^
		DevSta:	CorrErr- UncorrErr- FatalErr+ UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x4, ASPM L0s, Latency L0 <4us, L1 <64us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
<--snip-->

Thanks,
Joe

^ permalink raw reply

* Re: 82571EB: Detected Hardware Unit Hang
From: Joe Jin @ 2012-11-20 13:24 UTC (permalink / raw)
  To: Dave, Tushar N
  Cc: e1000-devel@lists.sf.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Mary Mcgrath
In-Reply-To: <061C8A8601E8EE4CA8D8FD6990CEA8913349EB41@ORSMSX102.amr.corp.intel.com>

On 11/20/12 16:59, Dave, Tushar N wrote:
> Have you power off the system completely after modifying eeprom? If not please do so.

seems not works for me, would you please help to check what is wrong of my operations?

Original eeprom dump:

# ethtool -e eth3 | head -8
Offset		Values
------		------
0x0000		00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 
0x0010		57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 
0x0020		08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 
0x0030		f6 6c b0 37 a6 07 03 84 83 07 00 00 03 c3 02 06 
                            ^^^^^
0x0040		08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 
0x0050		00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

# lspci -s 0000:52:00.1 -vvv
52:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
<--snip-->
	Capabilities: [e0] Express (v1) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 4096 bytes
                        ^^^^^^^^^^^^^^^^^^^^^
<--snip-->

# ethtool eth3
Settings for eth3:
	Supported ports: [ TP ]
	Supported link modes:   10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Full 
	Supports auto-negotiation: Yes
	Advertised link modes:  10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Full 
	Advertised pause frame use: No
	Advertised auto-negotiation: Yes
	Speed: 1000Mb/s
	Duplex: Full
	Port: Twisted Pair
	PHYAD: 1
	Transceiver: internal
	Auto-negotiation: on
	MDI-X: off
	Supports Wake-on: d
	Wake-on: d
	Current message level: 0x00000007 (7)
	Link detected: yes

# ethtool -E eth3 magic 0x10a48086 offset 0x34 value 0xa7
# ethtool -e eth3 | head -8
Offset		Values
------		------
0x0000		00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 
0x0010		57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 
0x0020		08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 
0x0030		f6 6c b0 37 a7 07 03 84 83 07 00 00 03 c3 02 06 
                            ^^^^^ <== a6 --> a7
0x0040		08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 
0x0050		00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

# reboot

# ethtool -e eth3 | head -8
Offset		Values
------		------
0x0000		00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 
0x0010		57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 
0x0020		08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 
0x0030		f6 6c b0 37 a7 07 03 84 83 07 00 00 03 c3 02 06 
0x0040		08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 
0x0050		00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

# lspci -s 0000:52:00.1 -vvv
52:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
<--snip-->
	Capabilities: [e0] Express (v1) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 4096 bytes
                        ^^^^^^^^^^^^^^^^^^^^^
		DevSta:	CorrErr- UncorrErr- FatalErr+ UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x4, ASPM L0s, Latency L0 <4us, L1 <64us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
<--snip-->

#  ethtool -E eth3 magic 0x10a48086 offset 0x35 value 0x17

# ethtool -e eth3 | head -8
Offset		Values
------		------
0x0000		00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 
0x0010		57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 
0x0020		08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 
0x0030		f6 6c b0 37 a6 17 03 84 83 07 00 00 03 c3 02 06 
                            ^^^^^<== 07 -> 17
0x0040		08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 
0x0050		00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

# reboot

# ethtool -e eth3 | head -8
Offset		Values
------		------
0x0000		00 15 17 16 ee 9a 24 05 ff ff a2 50 ff ff ff ff 
0x0010		57 d4 07 74 2f a4 a4 11 86 80 a4 10 86 80 65 b1 
0x0020		08 00 a4 10 00 58 00 00 01 50 00 00 00 00 00 01 
0x0030		f6 6c b0 37 a6 17 03 84 83 07 00 00 03 c3 02 06 
                            ^^^^^<== 07 -> 17
0x0040		08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00 
0x0050		00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

# lspci -s 0000:52:00.1 -vvv
52:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
<--snip-->
	Capabilities: [e0] Express (v1) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 4096 bytes
                        ^^^^^^^^^^^^^^^^^^^^^
		DevSta:	CorrErr- UncorrErr- FatalErr+ UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x4, ASPM L0s, Latency L0 <4us, L1 <64us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
<--snip-->

Thanks,
Joe

^ permalink raw reply

* [PATCHv3 net-next] add DOVE extensions for VXLAN
From: David L Stevens @ 2012-11-20 12:50 UTC (permalink / raw)
  To: David Miller, Stephen Hemminger; +Cc: netdev


	This patch provides extensions to VXLAN for supporting Distributed
Overlay Virtual Ethernet (DOVE) networks. The patch includes:

	+ a dove flag per VXLAN device to enable DOVE extensions
	+ ARP reduction, whereby a bridge-connected VXLAN tunnel endpoint
		answers ARP requests from the local bridge on behalf of
		remote DOVE clients
	+ route short-circuiting (aka L3 switching). Known destination IP
		addresses use the corresponding destination MAC address for
		switching rather than going to a (possibly remote) router first.
	+ netlink notification messages for forwarding table and L3 switching
		misses

Changes since v2
	- combined bools into "u32 flags"
	- replaced loop with !is_zero_ether_addr()

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 8aca888..320a93f 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -29,6 +29,8 @@
 #include <linux/etherdevice.h>
 #include <linux/if_ether.h>
 #include <linux/hash.h>
+#include <net/arp.h>
+#include <net/ndisc.h>
 #include <net/ip.h>
 #include <net/icmp.h>
 #include <net/udp.h>
@@ -110,7 +112,7 @@ struct vxlan_dev {
 	__u16		  port_max;
 	__u8		  tos;		/* TOS override */
 	__u8		  ttl;
-	bool		  learn;
+	u32		  flags;	/* VXLAN_F_* below */
 
 	unsigned long	  age_interval;
 	struct timer_list age_timer;
@@ -122,6 +124,12 @@ struct vxlan_dev {
 	struct hlist_head fdb_head[FDB_HASH_SIZE];
 };
 
+#define VXLAN_F_LEARN	0x01
+#define VXLAN_F_PROXY	0x02
+#define VXLAN_F_RSC	0x04
+#define VXLAN_F_L2MISS	0x08
+#define VXLAN_F_L3MISS	0x10
+
 /* salt for hash table */
 static u32 vxlan_salt __read_mostly;
 
@@ -155,6 +163,7 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan,
 	struct nda_cacheinfo ci;
 	struct nlmsghdr *nlh;
 	struct ndmsg *ndm;
+	bool send_ip, send_eth;
 
 	nlh = nlmsg_put(skb, portid, seq, type, sizeof(*ndm), flags);
 	if (nlh == NULL)
@@ -162,16 +171,24 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan,
 
 	ndm = nlmsg_data(nlh);
 	memset(ndm, 0, sizeof(*ndm));
-	ndm->ndm_family	= AF_BRIDGE;
+
+	send_eth = send_ip = true;
+
+	if (type == RTM_GETNEIGH) {
+		ndm->ndm_family	= AF_INET;
+		send_ip = fdb->remote_ip != 0;
+		send_eth = !is_zero_ether_addr(fdb->eth_addr);
+	} else
+		ndm->ndm_family	= AF_BRIDGE;
 	ndm->ndm_state = fdb->state;
 	ndm->ndm_ifindex = vxlan->dev->ifindex;
 	ndm->ndm_flags = NTF_SELF;
 	ndm->ndm_type = NDA_DST;
 
-	if (nla_put(skb, NDA_LLADDR, ETH_ALEN, &fdb->eth_addr))
+	if (send_eth && nla_put(skb, NDA_LLADDR, ETH_ALEN, &fdb->eth_addr))
 		goto nla_put_failure;
 
-	if (nla_put_be32(skb, NDA_DST, fdb->remote_ip))
+	if (send_ip && nla_put_be32(skb, NDA_DST, fdb->remote_ip))
 		goto nla_put_failure;
 
 	ci.ndm_used	 = jiffies_to_clock_t(now - fdb->used);
@@ -223,6 +240,29 @@ errout:
 		rtnl_set_sk_err(net, RTNLGRP_NEIGH, err);
 }
 
+static void vxlan_ip_miss(struct net_device *dev, __be32 ipa)
+{
+	struct vxlan_dev *vxlan = netdev_priv(dev);
+	struct vxlan_fdb f;
+
+	memset(&f, 0, sizeof f);
+	f.state = NUD_STALE;
+	f.remote_ip = ipa; /* goes to NDA_DST */
+
+	vxlan_fdb_notify(vxlan, &f, RTM_GETNEIGH);
+}
+
+static void vxlan_fdb_miss(struct vxlan_dev *vxlan, const u8 eth_addr[ETH_ALEN])
+{
+	struct vxlan_fdb	f;
+
+	memset(&f, 0, sizeof f);
+	f.state = NUD_STALE;
+	memcpy(f.eth_addr, eth_addr, ETH_ALEN);
+
+	vxlan_fdb_notify(vxlan, &f, RTM_GETNEIGH);
+}
+
 /* Hash Ethernet address */
 static u32 eth_hash(const unsigned char *addr)
 {
@@ -552,6 +592,8 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 		goto drop;
 	}
 
+	skb_reset_mac_header(skb);
+
 	/* Re-examine inner Ethernet packet */
 	oip = ip_hdr(skb);
 	skb->protocol = eth_type_trans(skb, vxlan->dev);
@@ -561,7 +603,7 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 			       vxlan->dev->dev_addr) == 0)
 		goto drop;
 
-	if (vxlan->learn)
+	if (vxlan->flags & VXLAN_F_LEARN)
 		vxlan_snoop(skb->dev, oip->saddr, eth_hdr(skb)->h_source);
 
 	__skb_tunnel_rx(skb, vxlan->dev);
@@ -600,6 +642,117 @@ drop:
 	return 0;
 }
 
+static int arp_reduce(struct net_device *dev, struct sk_buff *skb)
+{
+	struct vxlan_dev *vxlan = netdev_priv(dev);
+	struct arphdr *parp;
+	u8 *arpptr, *sha;
+	__be32 sip, tip;
+	struct neighbour *n;
+
+	if (dev->flags & IFF_NOARP)
+		goto out;
+
+	if (!pskb_may_pull(skb, arp_hdr_len(dev))) {
+		dev->stats.tx_dropped++;
+		goto out;
+	}
+	parp = arp_hdr(skb);
+
+	if ((parp->ar_hrd != htons(ARPHRD_ETHER) &&
+	     parp->ar_hrd != htons(ARPHRD_IEEE802)) ||
+	    parp->ar_pro != htons(ETH_P_IP) ||
+	    parp->ar_op != htons(ARPOP_REQUEST) ||
+	    parp->ar_hln != dev->addr_len ||
+	    parp->ar_pln != 4)
+		goto out;
+	arpptr = (u8 *)parp + sizeof(struct arphdr);
+	sha = arpptr;
+	arpptr += dev->addr_len;	/* sha */
+	memcpy(&sip, arpptr, sizeof(sip));
+	arpptr += sizeof(sip);
+	arpptr += dev->addr_len;	/* tha */
+	memcpy(&tip, arpptr, sizeof(tip));
+
+	if (ipv4_is_loopback(tip) ||
+	    ipv4_is_multicast(tip))
+		goto out;
+
+	n = neigh_lookup(&arp_tbl, &tip, dev);
+
+	if (n) {
+		struct vxlan_dev *vxlan = netdev_priv(dev);
+		struct vxlan_fdb *f;
+		struct sk_buff	*reply;
+
+		if (!(n->nud_state & NUD_CONNECTED)) {
+			neigh_release(n);
+			goto out;
+		}
+
+		f = vxlan_find_mac(vxlan, n->ha);
+		if (f && f->remote_ip == 0) {
+			/* bridge-local neighbor */
+			neigh_release(n);
+			goto out;
+		}
+
+		reply = arp_create(ARPOP_REPLY, ETH_P_ARP, sip, dev, tip, sha,
+				n->ha, sha);
+
+		neigh_release(n);
+
+		skb_reset_mac_header(reply);
+		__skb_pull(reply, skb_network_offset(reply));
+		reply->ip_summed = CHECKSUM_UNNECESSARY;
+		reply->pkt_type = PACKET_HOST;
+
+		if (netif_rx_ni(reply) == NET_RX_DROP)
+			dev->stats.rx_dropped++;
+	} else if (vxlan->flags & VXLAN_F_L3MISS)
+		vxlan_ip_miss(dev, tip);
+out:
+	consume_skb(skb);
+	return NETDEV_TX_OK;
+}
+
+static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
+{
+	struct vxlan_dev *vxlan = netdev_priv(dev);
+	struct neighbour *n;
+	struct iphdr *pip;
+
+	if (is_multicast_ether_addr(eth_hdr(skb)->h_dest))
+		return false;
+
+	n = NULL;
+	switch (ntohs(eth_hdr(skb)->h_proto)) {
+	case ETH_P_IP:
+		if (!pskb_may_pull(skb, sizeof(struct iphdr)))
+			return false;
+		pip = ip_hdr(skb);
+		n = neigh_lookup(&arp_tbl, &pip->daddr, dev);
+		break;
+	default:
+		return false;
+	}
+
+	if (n) {
+		bool diff;
+
+		diff = compare_ether_addr(eth_hdr(skb)->h_dest, n->ha) != 0;
+		if (diff) {
+			memcpy(eth_hdr(skb)->h_source, eth_hdr(skb)->h_dest,
+				dev->addr_len);
+			memcpy(eth_hdr(skb)->h_dest, n->ha, dev->addr_len);
+		}
+		neigh_release(n);
+		return diff;
+	} else if (vxlan->flags & VXLAN_F_L3MISS)
+		vxlan_ip_miss(dev, pip->daddr);
+	return false;
+}
+
 /* Extract dsfield from inner protocol */
 static inline u8 vxlan_get_dsfield(const struct iphdr *iph,
 				   const struct sk_buff *skb)
@@ -622,22 +775,6 @@ static inline u8 vxlan_ecn_encap(u8 tos,
 	return INET_ECN_encapsulate(tos, inner);
 }
 
-static __be32 vxlan_find_dst(struct vxlan_dev *vxlan, struct sk_buff *skb)
-{
-	const struct ethhdr *eth = (struct ethhdr *) skb->data;
-	const struct vxlan_fdb *f;
-
-	if (is_multicast_ether_addr(eth->h_dest))
-		return vxlan->gaddr;
-
-	f = vxlan_find_mac(vxlan, eth->h_dest);
-	if (f)
-		return f->remote_ip;
-	else
-		return vxlan->gaddr;
-
-}
-
 static void vxlan_sock_free(struct sk_buff *skb)
 {
 	sock_put(skb->sk);
@@ -684,6 +821,7 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct vxlan_dev *vxlan = netdev_priv(dev);
 	struct rtable *rt;
 	const struct iphdr *old_iph;
+	struct ethhdr *eth;
 	struct iphdr *iph;
 	struct vxlanhdr *vxh;
 	struct udphdr *uh;
@@ -694,10 +832,50 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
 	__be16 df = 0;
 	__u8 tos, ttl;
 	int err;
+	bool did_rsc = false;
+	const struct vxlan_fdb *f;
+
+	skb_reset_mac_header(skb);
+	eth = eth_hdr(skb);
+
+	if ((vxlan->flags & VXLAN_F_PROXY) && ntohs(eth->h_proto) == ETH_P_ARP)
+		return arp_reduce(dev, skb);
+	else if ((vxlan->flags&VXLAN_F_RSC) && ntohs(eth->h_proto) == ETH_P_IP)
+		did_rsc = route_shortcircuit(dev, skb);
 
-	dst = vxlan_find_dst(vxlan, skb);
-	if (!dst)
+	f = vxlan_find_mac(vxlan, eth->h_dest);
+	if (f == NULL) {
+		did_rsc = false;
+		dst = vxlan->gaddr;
+		if (!dst && (vxlan->flags & VXLAN_F_L2MISS) &&
+		    !is_multicast_ether_addr(eth->h_dest))
+			vxlan_fdb_miss(vxlan, eth->h_dest);
+	} else
+		dst = f->remote_ip;
+
+	if (!dst) {
+		if (did_rsc) {
+			__skb_pull(skb, skb_network_offset(skb));
+			skb->ip_summed = CHECKSUM_NONE;
+			skb->pkt_type = PACKET_HOST;
+
+			/* short-circuited back to local bridge */
+			if (netif_rx(skb) == NET_RX_SUCCESS) {
+				struct vxlan_stats *stats =
+						this_cpu_ptr(vxlan->stats);
+		
+				u64_stats_update_begin(&stats->syncp);
+				stats->tx_packets++;
+				stats->tx_bytes += pkt_len;
+				u64_stats_update_end(&stats->syncp);
+			} else {
+				dev->stats.tx_errors++;
+				dev->stats.tx_aborted_errors++;
+			}
+			return NETDEV_TX_OK;
+		}
 		goto drop;
+	}
 
 	/* Need space for new headers (invalidates iph ptr) */
 	if (skb_cow_head(skb, VXLAN_HEADROOM))
@@ -1020,6 +1198,10 @@ static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = {
 	[IFLA_VXLAN_AGEING]	= { .type = NLA_U32 },
 	[IFLA_VXLAN_LIMIT]	= { .type = NLA_U32 },
 	[IFLA_VXLAN_PORT_RANGE] = { .len  = sizeof(struct ifla_vxlan_port_range) },
+	[IFLA_VXLAN_PROXY]	= { .type = NLA_U8 },
+	[IFLA_VXLAN_RSC]	= { .type = NLA_U8 },
+	[IFLA_VXLAN_L2MISS]	= { .type = NLA_U8 },
+	[IFLA_VXLAN_L3MISS]	= { .type = NLA_U8 },
 };
 
 static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
@@ -1111,13 +1293,25 @@ static int vxlan_newlink(struct net *net, struct net_device *dev,
 		vxlan->ttl = nla_get_u8(data[IFLA_VXLAN_TTL]);
 
 	if (!data[IFLA_VXLAN_LEARNING] || nla_get_u8(data[IFLA_VXLAN_LEARNING]))
-		vxlan->learn = true;
+		vxlan->flags |= VXLAN_F_LEARN;
 
 	if (data[IFLA_VXLAN_AGEING])
 		vxlan->age_interval = nla_get_u32(data[IFLA_VXLAN_AGEING]);
 	else
 		vxlan->age_interval = FDB_AGE_DEFAULT;
 
+	if (data[IFLA_VXLAN_PROXY] && nla_get_u8(data[IFLA_VXLAN_PROXY]))
+		vxlan->flags |= VXLAN_F_PROXY;
+
+	if (data[IFLA_VXLAN_RSC] && nla_get_u8(data[IFLA_VXLAN_RSC]))
+		vxlan->flags |= VXLAN_F_RSC;
+
+	if (data[IFLA_VXLAN_L2MISS] && nla_get_u8(data[IFLA_VXLAN_L2MISS]))
+		vxlan->flags |= VXLAN_F_L2MISS;
+
+	if (data[IFLA_VXLAN_L3MISS] && nla_get_u8(data[IFLA_VXLAN_L3MISS]))
+		vxlan->flags |= VXLAN_F_L3MISS;
+
 	if (data[IFLA_VXLAN_LIMIT])
 		vxlan->addrmax = nla_get_u32(data[IFLA_VXLAN_LIMIT]);
 
@@ -1154,6 +1348,10 @@ static size_t vxlan_get_size(const struct net_device *dev)
 		nla_total_size(sizeof(__u8)) +	/* IFLA_VXLAN_TTL */
 		nla_total_size(sizeof(__u8)) +	/* IFLA_VXLAN_TOS */
 		nla_total_size(sizeof(__u8)) +	/* IFLA_VXLAN_LEARNING */
+		nla_total_size(sizeof(__u8)) +	/* IFLA_VXLAN_PROXY */
+		nla_total_size(sizeof(__u8)) +	/* IFLA_VXLAN_RSC */
+		nla_total_size(sizeof(__u8)) +	/* IFLA_VXLAN_L2MISS */
+		nla_total_size(sizeof(__u8)) +	/* IFLA_VXLAN_L3MISS */
 		nla_total_size(sizeof(__u32)) +	/* IFLA_VXLAN_AGEING */
 		nla_total_size(sizeof(__u32)) +	/* IFLA_VXLAN_LIMIT */
 		nla_total_size(sizeof(struct ifla_vxlan_port_range)) +
@@ -1182,7 +1380,15 @@ static int vxlan_fill_info(struct sk_buff *skb, const struct net_device *dev)
 
 	if (nla_put_u8(skb, IFLA_VXLAN_TTL, vxlan->ttl) ||
 	    nla_put_u8(skb, IFLA_VXLAN_TOS, vxlan->tos) ||
-	    nla_put_u8(skb, IFLA_VXLAN_LEARNING, vxlan->learn) ||
+	    nla_put_u8(skb, IFLA_VXLAN_LEARNING,
+			!!(vxlan->flags & VXLAN_F_LEARN)) ||
+	    nla_put_u8(skb, IFLA_VXLAN_PROXY,
+			!!(vxlan->flags & VXLAN_F_PROXY)) ||
+	    nla_put_u8(skb, IFLA_VXLAN_RSC, !!(vxlan->flags & VXLAN_F_RSC)) ||
+	    nla_put_u8(skb, IFLA_VXLAN_L2MISS,
+			!!(vxlan->flags & VXLAN_F_L2MISS)) ||
+	    nla_put_u8(skb, IFLA_VXLAN_L3MISS,
+			!!(vxlan->flags & VXLAN_F_L3MISS)) ||
 	    nla_put_u32(skb, IFLA_VXLAN_AGEING, vxlan->age_interval) ||
 	    nla_put_u32(skb, IFLA_VXLAN_LIMIT, vxlan->addrmax))
 		goto nla_put_failure;
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 5c80cb1..89695ff 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -285,6 +285,10 @@ enum {
 	IFLA_VXLAN_AGEING,
 	IFLA_VXLAN_LIMIT,
 	IFLA_VXLAN_PORT_RANGE,
+	IFLA_VXLAN_PROXY,
+	IFLA_VXLAN_RSC,
+	IFLA_VXLAN_L2MISS,
+	IFLA_VXLAN_L3MISS,
 	__IFLA_VXLAN_MAX
 };
 #define IFLA_VXLAN_MAX	(__IFLA_VXLAN_MAX - 1)

^ permalink raw reply related

* Re: [Xen-devel] [PATCH] xen/netfront: handle compound page fragments on transmit
From: Jan Beulich @ 2012-11-20 12:28 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Stefan Bader, Sander Eikelenboom, Eric Dumazet,
	Konrad Rzeszutek Wilk, xen-devel, ANNIE LI, netdev
In-Reply-To: <1353411606-15940-1-git-send-email-ian.campbell@citrix.com>

>>> On 20.11.12 at 12:40, Ian Campbell <ian.campbell@citrix.com> wrote:
> An SKB paged fragment can consist of a compound page with order > 0.
> However the netchannel protocol deals only in PAGE_SIZE frames.
> 
> Handle this in xennet_make_frags by iterating over the frames which
> make up the page.
> 
> This is the netfront equivalent to 6a8ed462f16b for netback.

Wouldn't you need to be at least a little more conservative here
with respect to resource use: I realize that get_id_from_freelist()
return values were never checked, and failure of
gnttab_claim_grant_reference() was always dealt with via
BUG_ON(), but considering that netfront_tx_slot_available()
doesn't account for compound page fragments, I think this (lack
of) error handling needs improvement in the course of the
change here (regardless of - I think - someone having said that
usually the sum of all pages referenced from an skb's fragments
would not exceed MAX_SKB_FRAGS - "usually" just isn't enough
imo).

Jan

> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Cc: netdev@vger.kernel.org 
> Cc: xen-devel@lists.xen.org 
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
> Cc: ANNIE LI <annie.li@oracle.com>
> Cc: Sander Eikelenboom <linux@eikelenboom.it>
> Cc: Stefan Bader <stefan.bader@canonical.com>
> ---
>  drivers/net/xen-netfront.c |   58 +++++++++++++++++++++++++++++++++----------
>  1 files changed, 44 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index caa0110..a12b99a 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -452,24 +452,54 @@ static void xennet_make_frags(struct sk_buff *skb, 
> struct net_device *dev,
>  	/* Grant backend access to each skb fragment page. */
>  	for (i = 0; i < frags; i++) {
>  		skb_frag_t *frag = skb_shinfo(skb)->frags + i;
> +		struct page *page = skb_frag_page(frag);
> +		unsigned long size = skb_frag_size(frag);
> +		unsigned long offset = frag->page_offset;
>  
> -		tx->flags |= XEN_NETTXF_more_data;
> +		/* Data must not cross a page boundary. */
> +		BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
>  
> -		id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs);
> -		np->tx_skbs[id].skb = skb_get(skb);
> -		tx = RING_GET_REQUEST(&np->tx, prod++);
> -		tx->id = id;
> -		ref = gnttab_claim_grant_reference(&np->gref_tx_head);
> -		BUG_ON((signed short)ref < 0);
> +		/* Skip unused frames from start of page */
> +		page += offset >> PAGE_SHIFT;
> +		offset &= ~PAGE_MASK;
>  
> -		mfn = pfn_to_mfn(page_to_pfn(skb_frag_page(frag)));
> -		gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id,
> -						mfn, GNTMAP_readonly);
> +		while (size > 0) {
> +			unsigned long bytes;
>  
> -		tx->gref = np->grant_tx_ref[id] = ref;
> -		tx->offset = frag->page_offset;
> -		tx->size = skb_frag_size(frag);
> -		tx->flags = 0;
> +			BUG_ON(offset >= PAGE_SIZE);
> +
> +			bytes = PAGE_SIZE - offset;
> +			if (bytes > size)
> +				bytes = size;
> +
> +			tx->flags |= XEN_NETTXF_more_data;
> +
> +			id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs);
> +			np->tx_skbs[id].skb = skb_get(skb);
> +			tx = RING_GET_REQUEST(&np->tx, prod++);
> +			tx->id = id;
> +			ref = gnttab_claim_grant_reference(&np->gref_tx_head);
> +			BUG_ON((signed short)ref < 0);
> +
> +			mfn = pfn_to_mfn(page_to_pfn(page));
> +			gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id,
> +							mfn, GNTMAP_readonly);
> +
> +			tx->gref = np->grant_tx_ref[id] = ref;
> +			tx->offset = offset;
> +			tx->size = bytes;
> +			tx->flags = 0;
> +
> +			offset += bytes;
> +			size -= bytes;
> +
> +			/* Next frame */
> +			if (offset == PAGE_SIZE && size) {
> +				BUG_ON(!PageCompound(page));
> +				page++;
> +				offset = 0;
> +			}
> +		}
>  	}
>  
>  	np->tx.req_prod_pvt = prod;
> -- 
> 1.7.2.5
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org 
> http://lists.xen.org/xen-devel 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox