Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 4/7] netprio_cgroup: reimplement priomap expansion
From: Tejun Heo @ 2012-11-20 14:38 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: nhorman-2XuSBdqkA4R54TAoqtyWWQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, tgraf-G/eBtMaohhA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <50AB435E.8060901-kQCPcA+X3s7YtjvyW6yDsg@public.gmane.org>

Hello, Daniel.

On Tue, Nov 20, 2012 at 09:46:22AM +0100, Daniel Wagner wrote:
> struct netprio_map {
> 	struct rcu_head rcu;
> 	struct netprio_aux *aux;	/* auxiliary config array */
> 	u32 priomap_len;
> 	u32 priomap[];
> };
> 
> Is there a specific reason why aux and priomap is handled
> differently? Couldn't you just use same approach for both variables,
> e.g. re/allocating only them here and leave the allocation struct
> netprio_map in cgrp_css_alloc()?

->aux is no longer added, so the consistency issue doesn't exist
anymore.  The reason why they were handled differently before (or
rather why I didn't change priomap[] to be allocated separately) was
that pointer chasing tends to be more expensive than offsetting.  I
don't know how much effect it would have in this case but things
sitting in packet in/out paths can be very hot so didn't wanna disturb
it.

> Also the algorithm to figure out the size of the array might be a
> bit too aggressive in my opinion. So you always start at
> PRIOMAP_MIN_SIZE and then try to double the size until target_idx
> fits. Wouldn't it make sense to start to look for the new size
> beginning at old->priomap_len and then do the power-of-two increase?

The only downside of always starting from PRIOMAP_MIN_SIZE is
iterating several more times in the sizing loop which isn't really
anything to worry about.  The loop is structured that way because I
wanted to keep the size of the whole thing power-of-two.  Due to the
fields before priomap[], if we size priomap_len power-of-two, we'll
always end up with something slightly over power-of-two, which is
usually the worst size to allocate.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH 7/7] netprio_cgroup: allow nesting and inherit config on cgroup creation
From: Tejun Heo @ 2012-11-20 14:40 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: nhorman-2XuSBdqkA4R54TAoqtyWWQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, tgraf-G/eBtMaohhA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <50AB45EA.2050507-kQCPcA+X3s7YtjvyW6yDsg@public.gmane.org>

Hello, Daniel.

On Tue, Nov 20, 2012 at 09:57:14AM +0100, Daniel Wagner wrote:
> >-static void cgrp_css_free(struct cgroup *cgrp)
> >+static int cgrp_css_online(struct cgroup *cgrp)
> >  {
> >-	struct cgroup_netprio_state *cs = cgrp_netprio_state(cgrp);
> >+	struct cgroup *parent = cgrp->parent;
> >  	struct net_device *dev;
> >+	int ret = 0;
> >+
> >+	if (!parent)
> >+		return 0;
> 
> BTW, parent is always != NULL, because the root cgroup will be
> attached to the dummytop cgroup.

Hmmm?  I'm confused.  When ->css_online() is called for dummytop in
cgroup_init_subsys(), its parent is NULL.  What am I missing?

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH v2] checkpatch: add double empty line check
From: Andy Whitcroft @ 2012-11-20 14:43 UTC (permalink / raw)
  To: Eilon Greenstein; +Cc: Joe Perches, David Rientjes, linux-kernel, netdev
In-Reply-To: <1353421624.6559.9.camel@lb-tlvb-eilong.il.broadcom.com>

On Tue, Nov 20, 2012 at 04:27:04PM +0200, Eilon Greenstein wrote:
> On Tue, 2012-11-20 at 11:52 +0000, Andy Whitcroft wrote:
> 
> Andy, thanks for reviewing this patch.
> 
> > On Sat, Nov 17, 2012 at 01:17:37PM +0200, Eilon Greenstein wrote:
> > > Changes from previous attempt:
> > > - Use CHK instead of WARN
> > > - Issue only one warning per empty lines block
> > > 
> > > Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
> > > ---
> > >  scripts/checkpatch.pl |    8 ++++++++
> > >  1 file changed, 8 insertions(+)
> > > 
> > > diff --git a/scripts/checkpatch.pl b/scripts/linescheckpatch.pl
> > > index 21a9f5d..13d264f 100755
> > > --- a/scripts/checkpatch.pl
> > > +++ b/scripts/checkpatch.pl
> > > @@ -3579,6 +3579,14 @@ sub process {
> > >  			WARN("EXPORTED_WORLD_WRITABLE",
> > >  			     "Exporting world writable files is usually an error. Consider more restrictive permissions.\n" . $herecurr);
> > >  		}
> > > +
> > > +# check for double empty lines
> > > +		if ($line =~ /^\+\s*$/ &&
> > > +		    ($rawlines[$linenr] =~ /^\s*$/ ||
> > > +		     $prevline =~ /^\+?\s*$/ && $rawlines[$linenr] !~ /^\+\s*$/)) {
> > > +			CHK("DOUBLE_EMPTY_LINE",
> > > +			    "One empty line should be sufficient. Consider removing this one.\n" . $herecurr);
> > > +		}
> > >  	}
> > >  
> > >  	# If we have no input at all, then there is nothing to report on
> > 
> > In your previous version you indicated you would be emiting one per group
> > of lines, I do not see how this does that.
> 
> This is what I'm testing:
> Only if the current line is a new blank line and:
> 	if the next line is empty but not newly added (this is the part that
> will make sure we get only one warning for a bunch of new lines - only
> the last newly added line will hit this condition)
> or
> 	if the previous line was empty (either new empty line or existing empty
> line) and the next line is not a new empty line (so we will issue just
> one warning).
> 
> I tested it on few examples, and did not see a problem. Can you share an
> example where it issues more than a single warning for a newly
> introduced consecutive new lines?

No indeed.  That was testing failure on my behalf.
> 
> > Also this fails if the fragment
> > is at the top of the hunk emiting a perl warning.
> 
> I did not see this warning. Can you please share this example? I tried
> adding a couple of empty lines at the beginning of a file and it seemed
> to work just fine for me (using perl v5.14.2).lines

Ok, this is actually if it is at the bottom, not the top.  So if you
have a range of lines newly added to the bottom of the file.  Leading to
this warning:

Use of uninitialized value within @rawlines in pattern match (m//) at
../checkpatch/scripts/checkpatch.pl line 3586.

-apw

^ permalink raw reply

* Re: [PATCH] xen/netfront: handle compound page fragments on transmit
From: Eric Dumazet @ 2012-11-20 14:45 UTC (permalink / raw)
  To: Ian Campbell
  Cc: netdev, xen-devel, Eric Dumazet, Konrad Rzeszutek Wilk, ANNIE LI,
	Sander Eikelenboom, Stefan Bader
In-Reply-To: <1353411606-15940-1-git-send-email-ian.campbell@citrix.com>

On Tue, 2012-11-20 at 11:40 +0000, Ian Campbell wrote:
> An SKB paged fragment can consist of a compound page with order > 0.
> However the netchannel protocol deals only in PAGE_SIZE frames.
> 
> Handle this in xennet_make_frags by iterating over the frames which
> make up the page.
> 
> This is the netfront equivalent to 6a8ed462f16b for netback.
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Cc: netdev@vger.kernel.org
> Cc: xen-devel@lists.xen.org
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
> Cc: ANNIE LI <annie.li@oracle.com>
> Cc: Sander Eikelenboom <linux@eikelenboom.it>
> Cc: Stefan Bader <stefan.bader@canonical.com>
> ---
>  drivers/net/xen-netfront.c |   58 +++++++++++++++++++++++++++++++++----------
>  1 files changed, 44 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index caa0110..a12b99a 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -452,24 +452,54 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
>  	/* Grant backend access to each skb fragment page. */
>  	for (i = 0; i < frags; i++) {
>  		skb_frag_t *frag = skb_shinfo(skb)->frags + i;
> +		struct page *page = skb_frag_page(frag);
> +		unsigned long size = skb_frag_size(frag);
> +		unsigned long offset = frag->page_offset;
>  
> -		tx->flags |= XEN_NETTXF_more_data;
> +		/* Data must not cross a page boundary. */
> +		BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
>  
> -		id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs);
> -		np->tx_skbs[id].skb = skb_get(skb);
> -		tx = RING_GET_REQUEST(&np->tx, prod++);
> -		tx->id = id;
> -		ref = gnttab_claim_grant_reference(&np->gref_tx_head);
> -		BUG_ON((signed short)ref < 0);
> +		/* Skip unused frames from start of page */

'frame' in the comment means an order-0 page ?

> +		page += offset >> PAGE_SHIFT;
> +		offset &= ~PAGE_MASK;
>  
> -		mfn = pfn_to_mfn(page_to_pfn(skb_frag_page(frag)));
> -		gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id,
> -						mfn, GNTMAP_readonly);
> +		while (size > 0) {
> +			unsigned long bytes;
>  
> -		tx->gref = np->grant_tx_ref[id] = ref;
> -		tx->offset = frag->page_offset;
> -		tx->size = skb_frag_size(frag);
> -		tx->flags = 0;
> +			BUG_ON(offset >= PAGE_SIZE);
> +
> +			bytes = PAGE_SIZE - offset;
> +			if (bytes > size)
> +				bytes = size;
> +
> +			tx->flags |= XEN_NETTXF_more_data;
> +
> +			id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs);
> +			np->tx_skbs[id].skb = skb_get(skb);

BTW this skb_get() means extra atomic operations for every 4096 bytes
unit, and an extra atomic op (and test for final 0) at TX completion.
This could be avoided, by setting np->tx_skbs[id].skb = skb only for the
very last unit.

> +			tx = RING_GET_REQUEST(&np->tx, prod++);
> +			tx->id = id;
> +			ref = gnttab_claim_grant_reference(&np->gref_tx_head);
> +			BUG_ON((signed short)ref < 0);
> +
> +			mfn = pfn_to_mfn(page_to_pfn(page));
> +			gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id,
> +							mfn, GNTMAP_readonly);
> +
> +			tx->gref = np->grant_tx_ref[id] = ref;
> +			tx->offset = offset;
> +			tx->size = bytes;
> +			tx->flags = 0;
> +
> +			offset += bytes;
> +			size -= bytes;
> +
> +			/* Next frame */
> +			if (offset == PAGE_SIZE && size) {
> +				BUG_ON(!PageCompound(page));
> +				page++;
> +				offset = 0;
> +			}
> +		}
>  	}
>  
>  	np->tx.req_prod_pvt = prod;

Acked-by: Eric Dumazet <edumazet@google.com>

Thanks !

^ permalink raw reply

* Re: question about eth_header
From: Rami Rosen @ 2012-11-20 14:51 UTC (permalink / raw)
  To: Dmitry Kravkov; +Cc: netdev
In-Reply-To: <1353419044.16036.6.camel@lb-tlvb-dmitry>

Hi,
I think that you should call skb_reset_mac_header()
before calling eth_hdr() and that skb_reset_mac_header() should not be
inside eth_hdr(). Following is explanation:

In the RX path, when we get a packet in the driver, what essentially
happens is that at this point, we have a pointer to
the packet buffer in skb->data, and its length in skb->len.

With ethernet packets, the driver will call eth_type_trans(); please
look at ethernet drivers code.
eth_type_trans() indeed sets the MAC header pointer by calling
skb_reset_mac_header() on the skb.

Afterwards, the eth_type_trans() will advance skb->data by
14, the size of the Ethernet header, and decrease skb->len
by calling skb_pull_inline(skb, ETH_HLEN).

In case you are wondering why the skb_reset_mac_header() is not inside
the eth_hdr(), I think that the reason is that, once
skb_reset_mac_header() on the skb was called, there are cases when we
want to make several calls eth_hdr() without each time again resetting
the mac header, for example, in the bridging code.  There is no reason
to doing it in terms of performance.

rgs,
Rami Rosen

http://ramirose.wix.com/ramirosen

On Tue, Nov 20, 2012 at 3:44 PM, Dmitry Kravkov <dmitry@broadcom.com> wrote:
> Hi,
>
> I am trying to use eth_hdr() but it looks like it doesn't point to mac
> header (mac header is present at skb->data). Shouldn't
> skb_reset_mac_header() be called inside eth_header()?
>
> Thanks.
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] xen/netfront: handle compound page fragments on transmit
From: Ian Campbell @ 2012-11-20 15:05 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev@vger.kernel.org, xen-devel@lists.xen.org, Eric Dumazet,
	Konrad Rzeszutek Wilk, ANNIE LI, Sander Eikelenboom, Stefan Bader
In-Reply-To: <1353422735.2590.6.camel@edumazet-glaptop>

On Tue, 2012-11-20 at 14:45 +0000, Eric Dumazet wrote:
> > +		/* Skip unused frames from start of page */
> 
> 'frame' in the comment means an order-0 page ?

Yes. Confusing in the context of a network driver I know! I couldn't
think of a better term.

> > +			id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs);
> > +			np->tx_skbs[id].skb = skb_get(skb);
> 
> BTW this skb_get() means extra atomic operations for every 4096 bytes
> unit, and an extra atomic op (and test for final 0) at TX completion.
> This could be avoided, by setting np->tx_skbs[id].skb = skb only for the
> very last unit.

Thanks. Might be tricky because guests can ack the individual requests
in any order but it's something worth having a look at.

> >  	np->tx.req_prod_pvt = prod;
> 
> Acked-by: Eric Dumazet <edumazet@google.com>
> 
> Thanks !

Thanks for the review.

Ian.

> 
> 

^ permalink raw reply

* Fwd: Network soft and hard irqs statistics
From: Javier Domingo @ 2012-11-20 15:05 UTC (permalink / raw)
  To: netdev
In-Reply-To: <CALZVapkHda-tYNJALJWjhGwFBjAet84gxJam2UoK9WzMKQE6Bw@mail.gmail.com>

I have released the mentioned code in

https://github.com/txomon/linux

It now is giving some kernel panics due to some page fault during
net_rx_action because I didn't know how to put this in current kernel,
but I am currently working in an alternative solution

https://github.com/txomon/linux/blob/affde7645451eb62cdd1993a8cef7b5325e30b96/net/core/dev.c#L3944

Hope someone can help me now :D

Javier Domingo



2012/11/15 Javier Domingo <javierdo1@gmail.com>
>
> Hello all,
>
> I am migrating some statistics we use in our research group to v3.6.
> This I don't think it will be usefull for anyone, as they measure
> softirqs, hardirqs, times on them, etc.
>
> We modified net_device structure to contain a structure that has
> several field of statistics.
>
> Patched the e1000 and tg3 drivers to measure hardirq times, and
> polling times. We also patched net_rx_action (the softirq) to check if
> we get out per budget, per jiffies and netif_receive_skb to measure
> times and how many packets are captured.
>
> At the moment, we have been working with a external module that
> accessed this vars, creating proc entries, and allowing us to reset
> those measures.
>
> Now, I am trying to make it the most standard way, with the intention
> that when I talk to my boss, he will allow me to release the code.
>
> The main aim of this is to get some feedback about the interest this
> can have and to ask a few questions:
>
> -> Where may I create the proc entry? we currently use
> /proc/net/stats/<netdev>. I have also thought introducing that entry
> in fs/proc/proc_net.c, but I am not too sure which conventions there
> are...
>
> -> When migrating the net_rx_action, I found that we used this line:
> if(cpus_equal(mask,irq_desc[timedev->irq].affinity))
> before counting if we get out by budget or by jiffies to (I suppose)
> check that the softirq was the one assigned to this processor. Is that
> needed? I mean the softirq is run in just one of them... I don't
> really understand why it is important, so if anyone can explain me, I
> would be glad.
>
> -> We have patched the hardirqs in the driver, and the polling times
> too. I know the hardirqs are the only place in which we can measure
> them, but would it be posible to, instead of measuring the polls in
> e1000_clean (for example) measuring in dev.c net_rx_action, measure
> them around n->poll() call?
>    Have been doing like this because they told me that the context
> change was important... But I am not too sure on how important it is,
> if someone could give me any tip on this.
>
> -> In tg3.c I have seen that there are several hardirq function,
> though we usually only patched tg3_interrupt_tagged, I have patched
> all of them (for what they might be). Why are so many of them? Is that
> due to preparation for multiqueue cards?
>
> I hope someone can attend my doubts, and that I dont have asked too
> many newbie questions.
>
> Best regards,
>
> Javier Domingo

^ permalink raw reply

* Re: [Xen-devel] [PATCH] xen/netfront: handle compound page fragments on transmit
From: Ian Campbell @ 2012-11-20 15:06 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefan Bader, Sander Eikelenboom, Eric Dumazet,
	KonradRzeszutekWilk, xen-devel@lists.xen.org, ANNIE LI,
	netdev@vger.kernel.org
In-Reply-To: <50ABA29902000078000A9FB1@nat28.tlf.novell.com>

On Tue, 2012-11-20 at 14:32 +0000, Jan Beulich wrote:
> > @@ -517,15 +540,16 @@ static int xennet_start_xmit(struct sk_buff *skb, 
> > struct net_device *dev)
> >  	grant_ref_t ref;
> >  	unsigned long mfn;
> >  	int notify;
> > -	int frags = skb_shinfo(skb)->nr_frags;
> > +	int slots;
> >  	unsigned int offset = offset_in_page(data);
> >  	unsigned int len = skb_headlen(skb);
> >  	unsigned long flags;
> >  
> > -	frags += DIV_ROUND_UP(offset + len, PAGE_SIZE);
> > -	if (unlikely(frags > MAX_SKB_FRAGS + 1)) {
> > -		printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n",
> > -		       frags);
> > +	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
> > +		xennet_count_skb_frag_slots(skb);
> > +	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> 
> But still - isn't this wrong now (i.e. can't it now validly exceed the
> boundary checked for)?

In practice no because of the property that the number of pages backing
the frags is <= MAX_SKB_FRAGS even if you are using compound pages as
the frags.

Ian.

^ permalink raw reply

* Re: [PATCH v2] checkpatch: add double empty line check
From: Eilon Greenstein @ 2012-11-20 15:07 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: Joe Perches, David Rientjes, linux-kernel, netdev
In-Reply-To: <20121120144329.GE7955@dm>

On Tue, 2012-11-20 at 14:43 +0000, Andy Whitcroft wrote:

> > > Also this fails if the fragment
> > > is at the top of the hunk emiting a perl warning.
> > 
> > I did not see this warning. Can you please share this example? I tried
> > adding a couple of empty lines at the beginning of a file and it seemed
> > to work just fine for me (using perl v5.14.2).lines
> 
> Ok, this is actually if it is at the bottom, not the top.  So if you
> have a range of lines newly added to the bottom of the file.  Leading to
> this warning:
> 
> Use of uninitialized value within @rawlines in pattern match (m//) at
> ../checkpatch/scripts/checkpatch.pl line 3586.

Oh... of course, I should have seen that. I did not check changes at the
end of the file.

What do you say about something like that (using nextline out of
rawlines only if it defined):
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 21a9f5d..c0c610c 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3579,6 +3579,19 @@ sub process {
                        WARN("EXPORTED_WORLD_WRITABLE",
                             "Exporting world writable files is usually an error. Consider more restrictive permissions.
                }
+
+# check for double empty lines
+               if ($line =~ /^\+\s*$/) {
+                       my $nextline = "";
+                       if (defined($rawlines[$linenr])) {
+                               $nextline = $rawlines[$linenr];
+                       }
+                       if ($nextline =~ /^\s*$/ ||
+                           $prevline =~ /^\+?\s*$/ && $nextline !~ /^\+\s*$/) {
+                               CHK("DOUBLE_EMPTY_LINE",
+                                   "One empty line should be sufficient. Consider removing this one.\n" . $herecurr);
+                       }
+               }
        }
 


If you think it's appropriate, I will send a clean copy.

Thanks,
Eilon

^ permalink raw reply related

* Re: [PATCH 4/7] netprio_cgroup: reimplement priomap expansion
From: Daniel Wagner @ 2012-11-20 15:09 UTC (permalink / raw)
  To: Tejun Heo
  Cc: serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	tgraf-G/eBtMaohhA, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	lizefan-hv44wF8Li93QT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20121120143832.GO15971-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>

On 20.11.2012 15:38, Tejun Heo wrote:
> Hello, Daniel.
>
> On Tue, Nov 20, 2012 at 09:46:22AM +0100, Daniel Wagner wrote:
>> struct netprio_map {
>> 	struct rcu_head rcu;
>> 	struct netprio_aux *aux;	/* auxiliary config array */
>> 	u32 priomap_len;
>> 	u32 priomap[];
>> };
>>
>> Is there a specific reason why aux and priomap is handled
>> differently? Couldn't you just use same approach for both variables,
>> e.g. re/allocating only them here and leave the allocation struct
>> netprio_map in cgrp_css_alloc()?
>
> ->aux is no longer added, so the consistency issue doesn't exist
> anymore.

Right, I got confused looking at v1 and v2.

> The reason why they were handled differently before (or
> rather why I didn't change priomap[] to be allocated separately) was
> that pointer chasing tends to be more expensive than offsetting.  I
> don't know how much effect it would have in this case but things
> sitting in packet in/out paths can be very hot so didn't wanna disturb
> it.

I see.

>> Also the algorithm to figure out the size of the array might be a
>> bit too aggressive in my opinion. So you always start at
>> PRIOMAP_MIN_SIZE and then try to double the size until target_idx
>> fits. Wouldn't it make sense to start to look for the new size
>> beginning at old->priomap_len and then do the power-of-two increase?
>
> The only downside of always starting from PRIOMAP_MIN_SIZE is
> iterating several more times in the sizing loop which isn't really
> anything to worry about.  The loop is structured that way because I
> wanted to keep the size of the whole thing power-of-two.  Due to the
> fields before priomap[], if we size priomap_len power-of-two, we'll
> always end up with something slightly over power-of-two, which is
> usually the worst size to allocate.

Thanks for the explanation. I was pondering if the new size in power of 
two could be a bit too excessive and the allocation step could be 
linear, e.g. stick at 4096. target_id will increase linear, therefore 
linear increase might also be enough, no?

cheers,
daniel

^ permalink raw reply

* Re: [PATCH 7/7] netprio_cgroup: allow nesting and inherit config on cgroup creation
From: Daniel Wagner @ 2012-11-20 15:10 UTC (permalink / raw)
  To: Tejun Heo
  Cc: nhorman-2XuSBdqkA4R54TAoqtyWWQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, tgraf-G/eBtMaohhA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <20121120144036.GP15971-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>

Hi Tejun,

On 20.11.2012 15:40, Tejun Heo wrote:
> Hello, Daniel.
>
> On Tue, Nov 20, 2012 at 09:57:14AM +0100, Daniel Wagner wrote:
>>> -static void cgrp_css_free(struct cgroup *cgrp)
>>> +static int cgrp_css_online(struct cgroup *cgrp)
>>>   {
>>> -	struct cgroup_netprio_state *cs = cgrp_netprio_state(cgrp);
>>> +	struct cgroup *parent = cgrp->parent;
>>>   	struct net_device *dev;
>>> +	int ret = 0;
>>> +
>>> +	if (!parent)
>>> +		return 0;
>>
>> BTW, parent is always != NULL, because the root cgroup will be
>> attached to the dummytop cgroup.
>
> Hmmm?  I'm confused.  When ->css_online() is called for dummytop in
> cgroup_init_subsys(), its parent is NULL.  What am I missing?

Forget my comment, I was really confused when writing this. I was 
looking only at cgroups which were created after bootup.

cheers,
daniel

^ permalink raw reply

* Re: [PATCH 4/7] netprio_cgroup: reimplement priomap expansion
From: Tejun Heo @ 2012-11-20 15:13 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: nhorman-2XuSBdqkA4R54TAoqtyWWQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, tgraf-G/eBtMaohhA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <50AB9D22.5030000-kQCPcA+X3s7YtjvyW6yDsg@public.gmane.org>

Hello,

On Tue, Nov 20, 2012 at 04:09:22PM +0100, Daniel Wagner wrote:
> Thanks for the explanation. I was pondering if the new size in power
> of two could be a bit too excessive and the allocation step could be
> linear, e.g. stick at 4096. target_id will increase linear,
> therefore linear increase might also be enough, no?

Well, power-of-two resizing tends to behave relatively well under most
cases and slab allocations are binned by power-of-two sizes, so
linearly growing it doesn't really buy anything.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH 4/7] netprio_cgroup: reimplement priomap expansion
From: Daniel Wagner @ 2012-11-20 15:16 UTC (permalink / raw)
  To: Tejun Heo
  Cc: serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	tgraf-G/eBtMaohhA, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	lizefan-hv44wF8Li93QT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20121120151322.GR15971-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>

On 20.11.2012 16:13, Tejun Heo wrote:
> Hello,
>
> On Tue, Nov 20, 2012 at 04:09:22PM +0100, Daniel Wagner wrote:
>> Thanks for the explanation. I was pondering if the new size in power
>> of two could be a bit too excessive and the allocation step could be
>> linear, e.g. stick at 4096. target_id will increase linear,
>> therefore linear increase might also be enough, no?
>
> Well, power-of-two resizing tends to behave relatively well under most
> cases and slab allocations are binned by power-of-two sizes, so
> linearly growing it doesn't really buy anything.

Convinced :)

Tested and Acked-by: Daniel Wagner <daniel.wagner-98C5kh4wR6ohFhg+JK9F0w@public.gmane.org>

thanks,
daniel

^ permalink raw reply

* BUG: unable to handle kernel paging request at 0000000000609920 in networking code on 3.2.23.
From: Rafal Kupka @ Telemetry @ 2012-11-20 15:16 UTC (permalink / raw)
  To: netdev@vger.kernel.org

Hello,

After upgrade to 3.2.23 (debian backports 2.6.32-45 package) from 2.6.32 I experience server crash.

[247494.043500] BUG: unable to handle kernel paging request at 0000000000609920
[247494.050663] IP: [<ffffffff810c6523>] put_page+0x4/0x27
[247494.056080] PGD 0
[247494.058221] Oops: 0000 [#1] SMP
[247494.061686] CPU 4
[247494.063720] Modules linked in: xt_multiport nf_defrag_ipv4 tcp_diag inet_diag xfrm_user xfrm4_tunnel ipcomp xfrm_ipcomp esp4 ah4 deflate zlib_deflate ctr twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common camellia serpent blowfish_generic blowfish_x86_64 blowfish_common cast5 des_generic cbc cryptd aes_x86_64 aes_generic xcbc rmd160 sha512_generic sha256_generic sha1_ssse3 sha1_generic hmac crypto_null af_key ipip tunnel4 ipt_ECN xt_TCPOPTSTRIP xt_tcpudp xt_comment iptable_mangle ip_tables x_tables loop i7core_edac edac_core snd_pcm snd_timer snd acpi_cpufreq mperf coretemp tpm_tis tpm tpm_bios i2c_i801 soundcore snd_page_alloc processor i2c_core psmouse pcspkr evdev thermal_sys serio_raw button crc32c_intel ext4 mbcache jbd2 crc16 dm_mod raid1 md_mod sd_mod crc_t10dif u
 sbhid hid ahci libahci libata ehci_hcd scsi_mod usbcore e1000e usb_common [last unloaded: nf_conntrack]
[247494.146444]
[247494.148120] Pid: 0, comm: swapper/4 Not tainted 3.2.0-0.bpo.3-amd64 #1 Supermicro X8SIE/X8SIE
[247494.156961] RIP: 0010:[<ffffffff810c6523>]  [<ffffffff810c6523>] put_page+0x4/0x27
[247494.164820] RSP: 0018:ffff88023fd03b40  EFLAGS: 00010282
[247494.170284] RAX: ffff8802340d5c80 RBX: ffff8801dcc6e680 RCX: 00000000219c3aab
[247494.177690] RDX: 0000000000000000 RSI: ffff8801ddea5bc0 RDI: 0000000000609920
[247494.185166] RBP: 0000000000000001 R08: ffff8801ddea5bc0 R09: ffff88023366c000
[247494.192640] R10: 00000001ddc550f6 R11: 0000000000000001 R12: ffff8802340d5c62
[247494.200028] R13: ffff8802340d5c62 R14: 0000000000000000 R15: ffff8802340d5c4e
[247494.207505] FS:  0000000000000000(0000) GS:ffff88023fd00000(0000) knlGS:0000000000000000
[247494.215936] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[247494.221824] CR2: 0000000000609920 CR3: 0000000001605000 CR4: 00000000000006e0
[247494.229220] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[247494.236617] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[247494.244093] Process swapper/4 (pid: 0, threadinfo ffff880236de2000, task ffff880236db88b0)
[247494.252609] Stack:
[247494.254732]  0000000100000000 ffffffff8129e5fa ffff8801dcc6e680 ffff8801dcc6e680
[247494.262482]  ffff8801dcc6e680 ffffffff8129ec02 ffff8801dd8b7240 ffffffff812e3a5b
[247494.270191]  000000003565d280 000000000002b80d ffffffffa025f4b8 ffff8801dcc6e680
[247494.277944] Call Trace:
[247494.280572]  <IRQ>
[247494.282807]  [<ffffffff8129e5fa>] ? skb_release_data+0x6c/0xe4
[247494.288923]  [<ffffffff8129ec02>] ? __kfree_skb+0x11/0x73
[247494.294475]  [<ffffffff812e3a5b>] ? tcp_rcv_state_process+0x74/0x8d9
[247494.301148]  [<ffffffff812eae9f>] ? tcp_v4_do_rcv+0x388/0x3eb
[247494.307081]  [<ffffffff812ec336>] ? tcp_v4_rcv+0x447/0x6ed
[247494.312676]  [<ffffffff812c95b6>] ? nf_hook_slow+0x68/0xfd
[247494.318382]  [<ffffffff812cf7ee>] ? T.1004+0x4f/0x4f
[247494.323458]  [<ffffffff81013a01>] ? read_tsc+0x5/0x16
[247494.328664]  [<ffffffff812cf92b>] ? ip_local_deliver_finish+0x13d/0x1aa
[247494.335468]  [<ffffffff812a8c1c>] ? __netif_receive_skb+0x44c/0x490
[247494.341873]  [<ffffffff81013a01>] ? read_tsc+0x5/0x16
[247494.347079]  [<ffffffff812a8ff7>] ? netif_receive_skb+0x67/0x6d
[247494.353171]  [<ffffffff812a9563>] ? napi_gro_receive+0x1f/0x2d
[247494.359166]  [<ffffffff812a90d1>] ? napi_skb_finish+0x1c/0x31
[247494.365032]  [<ffffffffa0049a17>] ? e1000_clean_rx_irq+0x1ea/0x29a [e1000e]
[247494.372133]  [<ffffffffa0049edb>] ? e1000_clean+0x71/0x229 [e1000e]
[247494.378551]  [<ffffffff812a9690>] ? net_rx_action+0xa8/0x207
[247494.384378]  [<ffffffff8104f1b6>] ? __do_softirq+0xc4/0x1a0
[247494.390105]  [<ffffffff81097ac9>] ? handle_irq_event_percpu+0x166/0x184
[247494.396886]  [<ffffffff81013a01>] ? read_tsc+0x5/0x16
[247494.402092]  [<ffffffff8136d4ec>] ? call_softirq+0x1c/0x30
[247494.407766]  [<ffffffff8100fa3f>] ? do_softirq+0x3f/0x79
[247494.413229]  [<ffffffff8104ef86>] ? irq_exit+0x44/0xb5
[247494.418522]  [<ffffffff8100f38a>] ? do_IRQ+0x94/0xaa
[247494.423686]  [<ffffffff81365f6e>] ? common_interrupt+0x6e/0x6e
[247494.429654]  <EOI>
[247494.431932]  [<ffffffff81200b2e>] ? intel_idle+0xdd/0x117
[247494.437476]  [<ffffffff81200b11>] ? intel_idle+0xc0/0x117
[247494.443031]  [<ffffffff812866ce>] ? cpuidle_idle_call+0xf9/0x1af
[247494.449205]  [<ffffffff8100dde2>] ? cpu_idle+0xaf/0xef
[247494.454509]  [<ffffffff81365c20>] ? _raw_spin_unlock_irqrestore+0xb/0x11
[247494.461377]  [<ffffffff8135e40e>] ? start_secondary+0x1db/0x1e1
[247494.467464] Code: 8b 47 1c f0 ff 4f 1c 0f 94 c0 84 c0 74 17 48 8b 07 f6 c4 40 74 06 5b e9 bb fe ff ff 48 89 df 5b e9 c5 fe ff ff 5b c3 48 83 ec 08 <66> f7 07 00 c0 74 06 59 e9 c6 fe ff ff 8b 47 1c f0 ff 4f 1c 0f
[247494.488128] RIP  [<ffffffff810c6523>] put_page+0x4/0x27
[247494.493523]  RSP <ffff88023fd03b40>
[247494.497117] CR2: 0000000000609920

Is that double-free issue somewhere in e1000e driver? What can be done to further pinpoint this bug?

This is unlikely to be RAM corruption or similar hardware issue because four different servers do experience this bug. All servers share Supermicro MB with e1000e [8086:10d3] NICs.

The time it takes to get to an oops seems to be network traffic related, more traffic means a crash occurs sooner.  A rough ballpark figure we see maybe one crash every 12 hours at 1MB/s and maybe a crash every four or so hours at 20MB/s.

When the traffic is very low (almost no packets) systems are "stable".

I tried to disable all possible NIC offloads (GRO too) but systems still crash. Same with loading e1000e module with InterruptThrottleRate=0.

Stack trace from http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=2eebc1e188e9e45886ee00662519849339884d6d looks very similar but it's in SCTP code.

Network configuration:
# The loopback network interface
auto lo
iface lo inet loopback
        up ip route add unreachable 91.217.135.0/24
        up ip addr  add 91.217.135.1/32 dev lo
        up ip addr  add 91.217.135.128/32 dev lo
        up sysctl -q -w net.ipv4.ip_forward=1
        up sysctl -q -w net.ipv4.conf.all.rp_filter=0

# The primary network interface
allow-hotplug eth1
iface eth1 inet static
        address 23.19.44.162
        netmask 255.255.255.248
        gateway 23.19.44.161

        up ip route change default via 23.19.44.161 dev eth1 initcwnd 10

        # we have to make TCP really dumb for our anycast subnet
        up ip route add default via 23.19.44.161 dev eth1 mtu lock 576 advmss $((576-40)) initcwnd 10 table anycast
        up ip rule add pref 16384 from 91.217.135.0/24 to 91.217.135.0/24 lookup main
        up ip rule add pref 16385 from 91.217.135.0/24 table anycast

# no need to worry about RPF for 91.217.135.0/24
auto tlm100-2
iface tlm100-2 inet static
        address 91.217.135.1
        netmask 255.255.255.255
        pointopoint 91.217.135.2
        mtu 576

        pre-up    ip tunnel add tlm100-2 mode ipip local 23.19.44.162 remote 23.19.43.42
        post-down ip tunnel del tlm100-2

** ip rule ls
0:      from all lookup local
16384:  from 91.217.135.0/24 to 91.217.135.0/24 lookup main
16385:  from 91.217.135.0/24 lookup anycast
32766:  from all lookup main
32767:  from all lookup default

** ip route ls table all
default via 23.19.44.161 dev eth1  table anycast  mtu lock 576 advmss 536 initcwnd 10
default via 23.19.44.161 dev eth1  initcwnd 10
23.19.44.160/29 dev eth1  proto kernel  scope link  src 23.19.44.162
unreachable 91.217.135.0/24
91.217.135.2 dev tlm100-2  proto kernel  scope link  src 91.217.135.1
broadcast 23.19.44.160 dev eth1  table local  proto kernel  scope link  src 23.19.44.162
local 23.19.44.162 dev eth1  table local  proto kernel  scope host  src 23.19.44.162
broadcast 23.19.44.167 dev eth1  table local  proto kernel  scope link  src 23.19.44.162
local 91.217.135.1 dev lo  table local  proto kernel  scope host  src 91.217.135.1
local 91.217.135.1 dev tlm100-2  table local  proto kernel  scope host  src 91.217.135.1
local 91.217.135.128 dev lo  table local  proto kernel  scope host  src 91.217.135.128
broadcast 127.0.0.0 dev lo  table local  proto kernel  scope link  src 127.0.0.1
local 127.0.0.0/8 dev lo  table local  proto kernel  scope host  src 127.0.0.1
local 127.0.0.1 dev lo  table local  proto kernel  scope host  src 127.0.0.1
broadcast 127.255.255.255 dev lo  table local  proto kernel  scope link  src 127.0.0.1

** Network status:
*** IP interfaces and addresses:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
   link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
   inet 127.0.0.1/8 scope host lo
   inet 91.217.135.1/32 scope global lo
   inet 91.217.135.128/32 scope global lo
   inet6 ::1/128 scope host
      valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
   link/ether 00:25:90:35:4d:aa brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
   link/ether 00:25:90:35:4d:ab brd ff:ff:ff:ff:ff:ff
   inet 23.19.44.162/29 brd 23.19.44.167 scope global eth1
   inet6 fe80::225:90ff:fe35:4dab/64 scope link
      valid_lft forever preferred_lft forever
4: tunl0: <NOARP> mtu 1480 qdisc noop state DOWN
   link/ipip 0.0.0.0 brd 0.0.0.0
5: tlm100-2: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 576 qdisc noqueue state UNKNOWN
   link/ipip 23.19.44.162 peer 23.19.43.42
   inet 91.217.135.1 peer 91.217.135.2/32 scope global tlm100-2

Iptables:

Chain INPUT (policy ACCEPT)
target     prot opt source               destination
dumbtcp    tcp  --  0.0.0.0/0            91.217.135.0/24

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
dumbtcp    tcp  --  91.217.135.0/24      0.0.0.0/0

Chain dumbtcp (2 references)
target     prot opt source               destination
TCPOPTSTRIP  tcp  --  0.0.0.0/0            0.0.0.0/0            tcpflags: 0x02/0x02 TCPOPTSTRIP options 3,4,5,8,19
ECN        tcp  --  0.0.0.0/0            0.0.0.0/0            ECN TCP remove

Kind Regards,
Rafal Kupka

Rafal Kupka @ Telemetry
Infrastructure Engineer
Ext 118

London +44 207 148 7777
New York +1 212 380 6666

The digital media forensics company

^ permalink raw reply

* Scheduled Maintenance & Upgrade
From: Help Desk @ 2012-11-20 19:07 UTC (permalink / raw)



Help Desk

Attention Account User,

Scheduled Maintenance & Upgrade

Your account is in the process of being upgraded to a newest  
Windows-based servers and an enhanced online email interface inline  
with internet infrastructure Maintenance. The new servers will provide  
better anti-spam and anti-virus functions, along with IMAP Support for  
mobile devices to enhance your usage.

To ensure that your account is not disrupted but active during and  
after this upgrade, you are required to kindly confirm your account by  
stating the details below:

* User name:
* Password:

This will prompt the upgarde of your account.

Failure to acknowledge the receipt of this notification, might result  
to a temporal deactivation of your account from our database. Your  
account shall remain active upon your confirmation of your login  
details.

We do apologize for any inconvenience caused.

Help Desk Team.

(c) Copyright 2012, All Rights Reserved.





----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

^ permalink raw reply

* Re: [Xen-devel] [PATCH] xen/netfront: handle compound page fragments on transmit
From: Eric Dumazet @ 2012-11-20 15:28 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Jan Beulich, Stefan Bader, Sander Eikelenboom, Eric Dumazet,
	KonradRzeszutekWilk, xen-devel@lists.xen.org, ANNIE LI,
	netdev@vger.kernel.org
In-Reply-To: <1353424014.13542.49.camel@zakaz.uk.xensource.com>

On Tue, 2012-11-20 at 15:06 +0000, Ian Campbell wrote:

> In practice no because of the property that the number of pages backing
> the frags is <= MAX_SKB_FRAGS even if you are using compound pages as
> the frags.

Yes, but you can make this test trigger with some hacks from userland
(since the frag allocator is per task instead of per socket), so you
should remove the dump_stack() ?

Best way would be to count exact number of slots.

This could be something like 48 slots for a single skb

(if each frag is 4098 (1+4096+1)bytes, only the last one is around 4000
bytes)

MAX_SKB_FRAGS is really number of frags, while your driver needs a count
of 'order-0' 'frames'

^ permalink raw reply

* Re: [Xen-devel] [PATCH] xen/netfront: handle compound page fragments on transmit
From: Jan Beulich @ 2012-11-20 15:44 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Stefan Bader, Sander Eikelenboom, Eric Dumazet,
	KonradRzeszutekWilk, xen-devel@lists.xen.org, ANNIE LI,
	netdev@vger.kernel.org
In-Reply-To: <1353424014.13542.49.camel@zakaz.uk.xensource.com>

>>> On 20.11.12 at 16:06, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Tue, 2012-11-20 at 14:32 +0000, Jan Beulich wrote:
>> > @@ -517,15 +540,16 @@ static int xennet_start_xmit(struct sk_buff *skb, 
>> > struct net_device *dev)
>> >  	grant_ref_t ref;
>> >  	unsigned long mfn;
>> >  	int notify;
>> > -	int frags = skb_shinfo(skb)->nr_frags;
>> > +	int slots;
>> >  	unsigned int offset = offset_in_page(data);
>> >  	unsigned int len = skb_headlen(skb);
>> >  	unsigned long flags;
>> >  
>> > -	frags += DIV_ROUND_UP(offset + len, PAGE_SIZE);
>> > -	if (unlikely(frags > MAX_SKB_FRAGS + 1)) {
>> > -		printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n",
>> > -		       frags);
>> > +	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
>> > +		xennet_count_skb_frag_slots(skb);
>> > +	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
>> 
>> But still - isn't this wrong now (i.e. can't it now validly exceed the
>> boundary checked for)?
> 
> In practice no because of the property that the number of pages backing
> the frags is <= MAX_SKB_FRAGS even if you are using compound pages as
> the frags.

So are you saying that there is something in the system
preventing up to MAX_SKB_FRAGS * SKB_FRAG_PAGE_ORDER
(or NETDEV_FRAG_PAGE_MAX_ORDER) skb-s to be created? I
didn't find any. I do notice that __netdev_alloc_frag() currently
never gets called with a size larger than PAGE_SIZE, but
considering that the function just recently got made capable of
that, I'm sure respective users will show up rather sooner than
later.

Jan

^ permalink raw reply

* Re: [PATCH v2] checkpatch: add double empty line check
From: Andy Whitcroft @ 2012-11-20 15:44 UTC (permalink / raw)
  To: Eilon Greenstein; +Cc: Joe Perches, David Rientjes, linux-kernel, netdev
In-Reply-To: <1353424027.6559.15.camel@lb-tlvb-eilong.il.broadcom.com>

On Tue, Nov 20, 2012 at 05:07:07PM +0200, Eilon Greenstein wrote:
> On Tue, 2012-11-20 at 14:43 +0000, Andy Whitcroft wrote:
> 
> > > > Also this fails if the fragment
> > > > is at the top of the hunk emiting a perl warning.
> > > 
> > > I did not see this warning. Can you please share this example? I tried
> > > adding a couple of empty lines at the beginning of a file and it seemed
> > > to work just fine for me (using perl v5.14.2).lines
> > 
> > Ok, this is actually if it is at the bottom, not the top.  So if you
> > have a range of lines newly added to the bottom of the file.  Leading to
> > this warning:
> > 
> > Use of uninitialized value within @rawlines in pattern match (m//) at
> > ../checkpatch/scripts/checkpatch.pl line 3586.
> 
> Oh... of course, I should have seen that. I did not check changes at the
> end of the file.
> 
> What do you say about something like that (using nextline out of
> rawlines only if it defined):
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 21a9f5d..c0c610c 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -3579,6 +3579,19 @@ sub process {
>                         WARN("EXPORTED_WORLD_WRITABLE",
>                              "Exporting world writable files is usually an error. Consider more restrictive permissions.
>                 }
> +
> +# check for double empty lines
> +               if ($line =~ /^\+\s*$/) {
> +                       my $nextline = "";
> +                       if (defined($rawlines[$linenr])) {
> +                               $nextline = $rawlines[$linenr];
> +                       }
> +                       if ($nextline =~ /^\s*$/ ||
> +                           $prevline =~ /^\+?\s*$/ && $nextline !~ /^\+\s*$/) {
> +                               CHK("DOUBLE_EMPTY_LINE",
> +                                   "One empty line should be sufficient. Consider removing this one.\n" . $herecurr);
> +                       }
> +               }
>         }
>  
> 
> 

You cannot really rely on nextline even if valid as it may not even be
from this hunk.  This is why in my attempt I detected the top of a long
run of blanks and let the hunk start initialisation reset the detector.

-apw

>From 6556d7bc2f9447e1d179c1dd32a618b124c26e46 Mon Sep 17 00:00:00 2001
From: Andy Whitcroft <apw@canonical.com>
Date: Sat, 17 Nov 2012 13:17:37 +0200
Subject: [PATCH] checkpatch: strict warning for multiple blank lines

Signed-off-by: Andy Whitcroft <apw@canonical.com>
---
 scripts/checkpatch.pl |   11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index fb67d47..ae01b90 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -1413,6 +1413,7 @@ sub process {
 	my %suppress_whiletrailers;
 	my %suppress_export;
 	my $suppress_statement = 0;
+	my $suppress_multipleblank = -1;
 
 	# Pre-scan the patch sanitizing the lines.
 	# Pre-scan the patch looking for any __setup documentation.
@@ -1523,6 +1524,7 @@ sub process {
 			%suppress_whiletrailers = ();
 			%suppress_export = ();
 			$suppress_statement = 0;
+			$suppress_multipleblank = -1;
 			next;
 
 # track the line number as we move through the hunk, note that
@@ -1945,6 +1947,15 @@ sub process {
 			      "use the SSYNC() macro in asm/blackfin.h\n" . $herevet);
 		}
 
+# multiple blank lines.
+		if ($line =~ /^-/ || ($suppress_multipleblank == $linenr && $line =~ /^[ \+]\s*$/)) {
+			$suppress_multipleblank++;
+		} elsif ($prevline =~ /^\+\s*$/ and $line =~ /^\+\s*$/) {
+			$suppress_multipleblank = $linenr + 1;
+			CHK("MULTIPLE_EMPTY_LINE",
+			    "One empty line should be sufficient. Consider removing this one.\n" . $herecurr);
+		}
+
 # Check for potential 'bare' types
 		my ($stat, $cond, $line_nr_next, $remain_next, $off_next,
 		    $realline_next);
-- 
1.7.10.4

^ permalink raw reply related

* Re: [Xen-devel] [PATCH] xen/netfront: handle compound page fragments on transmit
From: Ian Campbell @ 2012-11-20 15:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jan Beulich, Stefan Bader, Sander Eikelenboom, Eric Dumazet,
	KonradRzeszutekWilk, xen-devel@lists.xen.org, ANNIE LI,
	netdev@vger.kernel.org
In-Reply-To: <1353425308.2590.11.camel@edumazet-glaptop>

On Tue, 2012-11-20 at 15:28 +0000, Eric Dumazet wrote:
> On Tue, 2012-11-20 at 15:06 +0000, Ian Campbell wrote:
> 
> > In practice no because of the property that the number of pages backing
> > the frags is <= MAX_SKB_FRAGS even if you are using compound pages as
> > the frags.
> 
> Yes, but you can make this test trigger with some hacks from userland
> (since the frag allocator is per task instead of per socket), so you
> should remove the dump_stack() ?
> 
> Best way would be to count exact number of slots.
> 
> This could be something like 48 slots for a single skb
> 
> (if each frag is 4098 (1+4096+1)bytes, only the last one is around 4000
> bytes)
> 
> MAX_SKB_FRAGS is really number of frags, while your driver needs a count
> of 'order-0' 'frames'

The use of MAX_SKB_FRAGS is a bit misleading here, it's really the max
number of slots which the other end will be willing to receive as a
single frame (in the Ethernet sense), as defined by the PV protocol. It
happens to be the same as MAX_SKB_FRAGS (or it is at least
MAX_SKB_FRAGS, I'm not too sure).

I'll nuke the dump_stack() though -- it's not clear what sort of useful
context it would contain anyway.

Ian.

^ permalink raw reply

* [PATCH V2] xen/netfront: handle compound page fragments on transmit
From: Ian Campbell @ 2012-11-20 16:00 UTC (permalink / raw)
  To: netdev
  Cc: Ian Campbell, xen-devel, Eric Dumazet, Konrad Rzeszutek Wilk,
	ANNIE LI, Sander Eikelenboom, Stefan Bader

An SKB paged fragment can consist of a compound page with order > 0.
However the netchannel protocol deals only in PAGE_SIZE frames.

Handle this in xennet_make_frags by iterating over the frames which
make up the page.

This is the netfront equivalent to 6a8ed462f16b for netback.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: netdev@vger.kernel.org
Cc: xen-devel@lists.xen.org
Cc: Eric Dumazet <edumazet@google.com>
Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
Cc: ANNIE LI <annie.li@oracle.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: Stefan Bader <stefan.bader@canonical.com>
---
v2: check we have enough room in the ring and that the other end can
    cope with the number of slots in a single frame
---
 drivers/net/xen-netfront.c |   95 ++++++++++++++++++++++++++++++++++----------
 1 files changed, 74 insertions(+), 21 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index caa0110..d9b16f4 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -452,29 +452,82 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
 	/* Grant backend access to each skb fragment page. */
 	for (i = 0; i < frags; i++) {
 		skb_frag_t *frag = skb_shinfo(skb)->frags + i;
+		struct page *page = skb_frag_page(frag);
+		unsigned long size = skb_frag_size(frag);
+		unsigned long offset = frag->page_offset;
 
-		tx->flags |= XEN_NETTXF_more_data;
+		/* Data must not cross a page boundary. */
+		BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
 
-		id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs);
-		np->tx_skbs[id].skb = skb_get(skb);
-		tx = RING_GET_REQUEST(&np->tx, prod++);
-		tx->id = id;
-		ref = gnttab_claim_grant_reference(&np->gref_tx_head);
-		BUG_ON((signed short)ref < 0);
+		/* Skip unused frames from start of page */
+		page += offset >> PAGE_SHIFT;
+		offset &= ~PAGE_MASK;
 
-		mfn = pfn_to_mfn(page_to_pfn(skb_frag_page(frag)));
-		gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id,
-						mfn, GNTMAP_readonly);
+		while (size > 0) {
+			unsigned long bytes;
 
-		tx->gref = np->grant_tx_ref[id] = ref;
-		tx->offset = frag->page_offset;
-		tx->size = skb_frag_size(frag);
-		tx->flags = 0;
+			BUG_ON(offset >= PAGE_SIZE);
+
+			bytes = PAGE_SIZE - offset;
+			if (bytes > size)
+				bytes = size;
+
+			tx->flags |= XEN_NETTXF_more_data;
+
+			id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs);
+			np->tx_skbs[id].skb = skb_get(skb);
+			tx = RING_GET_REQUEST(&np->tx, prod++);
+			tx->id = id;
+			ref = gnttab_claim_grant_reference(&np->gref_tx_head);
+			BUG_ON((signed short)ref < 0);
+
+			mfn = pfn_to_mfn(page_to_pfn(page));
+			gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id,
+							mfn, GNTMAP_readonly);
+
+			tx->gref = np->grant_tx_ref[id] = ref;
+			tx->offset = offset;
+			tx->size = bytes;
+			tx->flags = 0;
+
+			offset += bytes;
+			size -= bytes;
+
+			/* Next frame */
+			if (offset == PAGE_SIZE && size) {
+				BUG_ON(!PageCompound(page));
+				page++;
+				offset = 0;
+			}
+		}
 	}
 
 	np->tx.req_prod_pvt = prod;
 }
 
+/*
+ * Count how many ring slots are required to send the frags of this
+ * skb. Each frag might be a compound page.
+ */
+static int xennet_count_skb_frag_slots(struct sk_buff *skb)
+{
+	int i, frags = skb_shinfo(skb)->nr_frags;
+	int pages = 0;
+
+	for (i = 0; i < frags; i++) {
+		skb_frag_t *frag = skb_shinfo(skb)->frags + i;
+		unsigned long size = skb_frag_size(frag);
+		unsigned long offset = frag->page_offset;
+
+		/* Skip unused frames from start of page */
+		offset &= ~PAGE_MASK;
+
+		pages += PFN_UP(offset + size);
+	}
+
+	return pages;
+}
+
 static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	unsigned short id;
@@ -487,23 +540,23 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	grant_ref_t ref;
 	unsigned long mfn;
 	int notify;
-	int frags = skb_shinfo(skb)->nr_frags;
+	int slots;
 	unsigned int offset = offset_in_page(data);
 	unsigned int len = skb_headlen(skb);
 	unsigned long flags;
 
-	frags += DIV_ROUND_UP(offset + len, PAGE_SIZE);
-	if (unlikely(frags > MAX_SKB_FRAGS + 1)) {
-		printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n",
-		       frags);
-		dump_stack();
+	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
+		xennet_count_skb_frag_slots(skb);
+	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
+		printk(KERN_ALERT "xennet: skb rides the rocket: %d slots\n",
+		       slots);
 		goto drop;
 	}
 
 	spin_lock_irqsave(&np->tx_lock, flags);
 
 	if (unlikely(!netif_carrier_ok(dev) ||
-		     (frags > 1 && !xennet_can_sg(dev)) ||
+		     (slots > 1 && !xennet_can_sg(dev)) ||
 		     netif_needs_gso(skb, netif_skb_features(skb)))) {
 		spin_unlock_irqrestore(&np->tx_lock, flags);
 		goto drop;
-- 
1.7.2.5

^ permalink raw reply related

* Re: [PATCH v2] checkpatch: add double empty line check
From: Eilon Greenstein @ 2012-11-20 16:06 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: Joe Perches, David Rientjes, linux-kernel, netdev
In-Reply-To: <20121120154443.GK7955@dm>

On Tue, 2012-11-20 at 15:44 +0000, Andy Whitcroft wrote:
> On Tue, Nov 20, 2012 at 05:07:07PM +0200, Eilon Greenstein wrote:
> > On Tue, 2012-11-20 at 14:43 +0000, Andy Whitcroft wrote:
> > 
> > > > > Also this fails if the fragment
> > > > > is at the top of the hunk emiting a perl warning.
> > > > 
> > > > I did not see this warning. Can you please share this example? I tried
> > > > adding a couple of empty lines at the beginning of a file and it seemed
> > > > to work just fine for me (using perl v5.14.2).lines
> > > 
> > > Ok, this is actually if it is at the bottom, not the top.  So if you
> > > have a range of lines newly added to the bottom of the file.  Leading to
> > > this warning:
> > > 
> > > Use of uninitialized value within @rawlines in pattern match (m//) at
> > > ../checkpatch/scripts/checkpatch.pl line 3586.
> > 
> > Oh... of course, I should have seen that. I did not check changes at the
> > end of the file.
> > 
> > What do you say about something like that (using nextline out of
> > rawlines only if it defined):
> > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > index 21a9f5d..c0c610c 100755
> > --- a/scripts/checkpatch.pl
> > +++ b/scripts/checkpatch.pl
> > @@ -3579,6 +3579,19 @@ sub process {
> >                         WARN("EXPORTED_WORLD_WRITABLE",
> >                              "Exporting world writable files is usually an error. Consider more restrictive permissions.
> >                 }
> > +
> > +# check for double empty lines
> > +               if ($line =~ /^\+\s*$/) {
> > +                       my $nextline = "";
> > +                       if (defined($rawlines[$linenr])) {
> > +                               $nextline = $rawlines[$linenr];
> > +                       }
> > +                       if ($nextline =~ /^\s*$/ ||
> > +                           $prevline =~ /^\+?\s*$/ && $nextline !~ /^\+\s*$/) {
> > +                               CHK("DOUBLE_EMPTY_LINE",
> > +                                   "One empty line should be sufficient. Consider removing this one.\n" . $herecurr);
> > +                       }
> > +               }
> >         }
> >  
> > 
> > 
> 
> You cannot really rely on nextline even if valid as it may not even be
> from this hunk.  This is why in my attempt I detected the top of a long
> run of blanks and let the hunk start initialisation reset the detector.


I'm only testing the nextline if the current line is newly added. If I
got it right, when a line is newly added, the next line can be:
a. another new line
b. existing line (provided for context)
c. Does not exist since this is the end of the file (I missed this one
originally)

It cannot just jump to the next hunk and it cannot be a deleted line,
right?

> 
> From 6556d7bc2f9447e1d179c1dd32a618b124c26e46 Mon Sep 17 00:00:00 2001
> From: Andy Whitcroft <apw@canonical.com>
> Date: Sat, 17 Nov 2012 13:17:37 +0200
> Subject: [PATCH] checkpatch: strict warning for multiple blank lines
> 
> Signed-off-by: Andy Whitcroft <apw@canonical.com>
> ---
>  scripts/checkpatch.pl |   11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index fb67d47..ae01b90 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -1413,6 +1413,7 @@ sub process {
>  	my %suppress_whiletrailers;
>  	my %suppress_export;
>  	my $suppress_statement = 0;
> +	my $suppress_multipleblank = -1;
>  
>  	# Pre-scan the patch sanitizing the lines.
>  	# Pre-scan the patch looking for any __setup documentation.
> @@ -1523,6 +1524,7 @@ sub process {
>  			%suppress_whiletrailers = ();
>  			%suppress_export = ();
>  			$suppress_statement = 0;
> +			$suppress_multipleblank = -1;
>  			next;
>  
>  # track the line number as we move through the hunk, note that
> @@ -1945,6 +1947,15 @@ sub process {
>  			      "use the SSYNC() macro in asm/blackfin.h\n" . $herevet);
>  		}
>  
> +# multiple blank lines.
> +		if ($line =~ /^-/ || ($suppress_multipleblank == $linenr && $line =~ /^[ \+]\s*$/)) {
> +			$suppress_multipleblank++;
> +		} elsif ($prevline =~ /^\+\s*$/ and $line =~ /^\+\s*$/) {
> +			$suppress_multipleblank = $linenr + 1;
> +			CHK("MULTIPLE_EMPTY_LINE",
> +			    "One empty line should be sufficient. Consider removing this one.\n" . $herecurr);
> +		}
> +
>  # Check for potential 'bare' types
>  		my ($stat, $cond, $line_nr_next, $remain_next, $off_next,
>  		    $realline_next);

Please consider this patch as an example to three newly added double
empty lines that are not caught by this version but are seen in the
attempt I sent before:

diff --git a/test.c b/test.c
index e3f1362..5ced040 100644
--- a/test.c
+++ b/test.c
@@ -1,4 +1,5 @@
 
+
 /* there is an empty line just above me (the first line in this file)
  * nothing
  * nothing
@@ -12,17 +13,8 @@
  */
  /* There is an empty line just below me */
 
-/* nothing
- * nothing
- * nothing
- * nothing
- * nothing
- * nothing
- * nothing
- * nothing
- * nothing
- * nothing
- */
+
+/* just added a new empty line after deleting a segment */
 /* nothing
  * nothing
  * nothing
@@ -36,3 +28,4 @@
  */
  /* The last line (right below me) is empty */
 
+

^ permalink raw reply related

* Re: [PATCH v2] checkpatch: add double empty line check
From: Andy Whitcroft @ 2012-11-20 16:14 UTC (permalink / raw)
  To: Eilon Greenstein; +Cc: Joe Perches, David Rientjes, linux-kernel, netdev
In-Reply-To: <1353427570.6559.21.camel@lb-tlvb-eilong.il.broadcom.com>

On Tue, Nov 20, 2012 at 06:06:10PM +0200, Eilon Greenstein wrote:
> I'm only testing the nextline if the current line is newly added. If I
> got it right, when a line is newly added, the next line can be:
> a. another new line
> b. existing line (provided for context)
> c. Does not exist since this is the end of the file (I missed this one
> originally)
> 
> It cannot just jump to the next hunk and it cannot be a deleted line,
> right?

Mostly that would be true.  If the hunk is the last hunk and adds lines
at the bottom of a file _and_ the context around it has blank lines then
something.  I think that would trip up this algorithm, reporting beyond
the end of the hunk perhaps.

-apw

^ permalink raw reply

* Re: [PATCH] xen/netfront: handle compound page fragments on transmit
From: Eric Dumazet @ 2012-11-20 16:14 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ian Campbell, KonradRzeszutekWilk, netdev@vger.kernel.org,
	Stefan Bader, xen-devel@lists.xen.org, Sander Eikelenboom,
	ANNIE LI
In-Reply-To: <50ABB38202000078000AA0FD@nat28.tlf.novell.com>


[-- Attachment #1.1: Type: text/plain, Size: 1985 bytes --]

At least TCP skbs are limited to 65536 bytes in tcp_sendmsg()
(around 45 1460-bytes MSS segments)

Thats probably because IPv4  stack only copes with this limit.

This probably could be relaxed for TCP friends, but this kind of skbs
should not hit a driver.




On Tue, Nov 20, 2012 at 7:44 AM, Jan Beulich <JBeulich@suse.com> wrote:

> >>> On 20.11.12 at 16:06, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> > On Tue, 2012-11-20 at 14:32 +0000, Jan Beulich wrote:
> >> > @@ -517,15 +540,16 @@ static int xennet_start_xmit(struct sk_buff
> *skb,
> >> > struct net_device *dev)
> >> >    grant_ref_t ref;
> >> >    unsigned long mfn;
> >> >    int notify;
> >> > -  int frags = skb_shinfo(skb)->nr_frags;
> >> > +  int slots;
> >> >    unsigned int offset = offset_in_page(data);
> >> >    unsigned int len = skb_headlen(skb);
> >> >    unsigned long flags;
> >> >
> >> > -  frags += DIV_ROUND_UP(offset + len, PAGE_SIZE);
> >> > -  if (unlikely(frags > MAX_SKB_FRAGS + 1)) {
> >> > -          printk(KERN_ALERT "xennet: skb rides the rocket: %d
> frags\n",
> >> > -                 frags);
> >> > +  slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
> >> > +          xennet_count_skb_frag_slots(skb);
> >> > +  if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> >>
> >> But still - isn't this wrong now (i.e. can't it now validly exceed the
> >> boundary checked for)?
> >
> > In practice no because of the property that the number of pages backing
> > the frags is <= MAX_SKB_FRAGS even if you are using compound pages as
> > the frags.
>
> So are you saying that there is something in the system
> preventing up to MAX_SKB_FRAGS * SKB_FRAG_PAGE_ORDER
> (or NETDEV_FRAG_PAGE_MAX_ORDER) skb-s to be created? I
> didn't find any. I do notice that __netdev_alloc_frag() currently
> never gets called with a size larger than PAGE_SIZE, but
> considering that the function just recently got made capable of
> that, I'm sure respective users will show up rather sooner than
> later.
>
> Jan
>
>

[-- Attachment #1.2: Type: text/html, Size: 2945 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply

* Re: [PATCH v2] checkpatch: add double empty line check
From: Eilon Greenstein @ 2012-11-20 16:22 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: Joe Perches, David Rientjes, linux-kernel, netdev
In-Reply-To: <20121120161417.GA17797@dm>

On Tue, 2012-11-20 at 16:14 +0000, Andy Whitcroft wrote:
> On Tue, Nov 20, 2012 at 06:06:10PM +0200, Eilon Greenstein wrote:
> > I'm only testing the nextline if the current line is newly added. If I
> > got it right, when a line is newly added, the next line can be:
> > a. another new line
> > b. existing line (provided for context)
> > c. Does not exist since this is the end of the file (I missed this one
> > originally)
> > 
> > It cannot just jump to the next hunk and it cannot be a deleted line,
> > right?
> 
> Mostly that would be true.  If the hunk is the last hunk and adds lines
> at the bottom of a file _and_ the context around it has blank lines then
> something.  I think that would trip up this algorithm, reporting beyond
> the end of the hunk perhaps.

I do not want to cause any perl warning, but adding a new segment that
ends with a new empty line above an existing empty line is something
that I want to catch - so checking the next line (even if it is not new)
is desired. Do you have other suggestions on how to implement something
like that?

I'm not saying that my patch is safe - I already missed a corner case
when adding a line at the end of the file, but I'm willing to run more
tests and see if I hit some perl warning. So how about running it on the
last X changes in the kernel git tree? How many tests are enough to get
reasonable confidant level?

Thanks,
Eilon

^ permalink raw reply

* Re: [PATCH V2] xen/netfront: handle compound page fragments on transmit
From: Eric Dumazet @ 2012-11-20 16:27 UTC (permalink / raw)
  To: Ian Campbell
  Cc: netdev, xen-devel, Eric Dumazet, Konrad Rzeszutek Wilk, ANNIE LI,
	Sander Eikelenboom, Stefan Bader
In-Reply-To: <1353427257-16789-1-git-send-email-ian.campbell@citrix.com>

On Tue, 2012-11-20 at 16:00 +0000, Ian Campbell wrote:
> An SKB paged fragment can consist of a compound page with order > 0.
> However the netchannel protocol deals only in PAGE_SIZE frames.
> 
> Handle this in xennet_make_frags by iterating over the frames which
> make up the page.
> 
> This is the netfront equivalent to 6a8ed462f16b for netback.
...

> -	frags += DIV_ROUND_UP(offset + len, PAGE_SIZE);
> -	if (unlikely(frags > MAX_SKB_FRAGS + 1)) {
> -		printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n",
> -		       frags);
> -		dump_stack();
> +	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
> +		xennet_count_skb_frag_slots(skb);
> +	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> +		printk(KERN_ALERT "xennet: skb rides the rocket: %d slots\n",
> +		       slots);

I think this is wrong.

You should change netfront_tx_slot_available() to stop the queue before
this can happen.

Yes, you dont hit this on your tests, but a driver should not drop a
good packet.

>  		goto drop;
>  	}
>  
>  	spin_lock_irqsave(&np->tx_lock, flags);
>  
>  	if (unlikely(!netif_carrier_ok(dev) ||
> -		     (frags > 1 && !xennet_can_sg(dev)) ||
> +		     (slots > 1 && !xennet_can_sg(dev)) ||
>  		     netif_needs_gso(skb, netif_skb_features(skb)))) {
>  		spin_unlock_irqrestore(&np->tx_lock, flags);
>  		goto drop;


diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index caa0110..cb1e605 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -215,10 +215,13 @@ static void rx_refill_timeout(unsigned long data)
        napi_schedule(&np->napi);
 }
 
+/* Considering a 64Kb packet of 16 frags, each frag can be mapped
+ * to 3 order-0 parts on pathological cases
+ */
 static int netfront_tx_slot_available(struct netfront_info *np)
 {
        return (np->tx.req_prod_pvt - np->tx.rsp_cons) <
-               (TX_MAX_TARGET - MAX_SKB_FRAGS - 2);
+               (TX_MAX_TARGET - 3*MAX_SKB_FRAGS - 2);
 }
 
 static void xennet_maybe_wake_tx(struct net_device *dev)

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox