Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] pkt_sched: QFQ Plus: fair-queueing service at DRR cost
From: Paolo Valente @ 2012-11-20 17:45 UTC (permalink / raw)
  To: David Miller; +Cc: jhs, shemminger, linux-kernel, netdev, rizzo, fchecconi
In-Reply-To: <20121119.184828.628106002307042971.davem@davemloft.net>

Il 20/11/2012 00:48, David Miller ha scritto:
> From: Paolo Valente <paolo.valente@unimore.it>
> Date: Mon, 12 Nov 2012 17:48:33 +0100
>
>> [This patch received positive feedback from Stephen Hemminger ("put in
>> net-next"), but no further feedback or decision. So I am (re)sending
>> an updated version of it. The only differences with respect to the
>> previous version are the support for TSO/GSO (taken from QFQ), and a
>> hopefully improved description.]
>
> Can you rearrange the logic so that the compiler doesn't emit this
> warning?
>
> In file included from net/sched/sch_qfq.c:18:0:
> net/sched/sch_qfq.c: In function ‘qfq_dequeue’:
> include/net/sch_generic.h:480:15: warning: ‘skb’ may be used uninitialized in this function [-Wmaybe-uninitialized]
> net/sched/sch_qfq.c:1007:18: note: ‘skb’ was declared here
>
> You and I both know that SKB will be initialized at this point, but
> the compiler can't see it clearly enough.
>
Unfortunately I could not reproduce the warning (with
gcc-4.7 -Wmaybe-uninitialized). I am however about to send a new version 
with skb initialized to NULL. I hope that this fix properly addresses 
this issue.
> Thanks.
>


-- 
-----------------------------------------------------------
| Paolo Valente              |                            |
| Algogroup                  |                            |
| Dip. Ing. Informazione     | tel:   +39 059 2056318     |
| Via Vignolese 905/b        | fax:   +39 059 2056129     |
| 41125 Modena - Italy       |                            |
|     home:  http://algo.ing.unimo.it/people/paolo/       |
-----------------------------------------------------------

^ permalink raw reply

* RE: question about eth_header
From: Dmitry Kravkov @ 2012-11-20 17:39 UTC (permalink / raw)
  To: Rami Rosen; +Cc: netdev@vger.kernel.org
In-Reply-To: <CAKoUArnRnx0txGWhNWpZAFBijWmtzdOe-D7Qb62-5AvBCfgHJQ@mail.gmail.com>

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Rami Rosen
> Sent: Tuesday, November 20, 2012 4:51 PM
> To: Dmitry Kravkov
> Cc: netdev@vger.kernel.org
> Subject: Re: question about eth_header
> 
> Hi,
> I think that you should call skb_reset_mac_header()
> before calling eth_hdr() and that skb_reset_mac_header() should not be
> inside eth_hdr(). Following is explanation:
> 
> In the RX path, when we get a packet in the driver, what essentially
> happens is that at this point, we have a pointer to
> the packet buffer in skb->data, and its length in skb->len.

Actually I'm on TX path, and do not intend  to update skb, just received it 
for transmission, I want to pick mac addresess from eth header.
Now bnx2x does it by accessing skb->data (which points to the correct mac)
and I wanted to replace it with eth_hrd instead of doing ugly casting in driver code.
Moreover, in the future, when inner_mac_address will be used for tunneling
acceleration, inner_mac_address will be updated with mac_address offset, and so
may also point to the improper location.   
 
> With ethernet packets, the driver will call eth_type_trans(); please
> look at ethernet drivers code.
> eth_type_trans() indeed sets the MAC header pointer by calling
> skb_reset_mac_header() on the skb.
> 
> Afterwards, the eth_type_trans() will advance skb->data by
> 14, the size of the Ethernet header, and decrease skb->len
> by calling skb_pull_inline(skb, ETH_HLEN).
> 
> In case you are wondering why the skb_reset_mac_header() is not inside
> the eth_hdr(), I think that the reason is that, once
> skb_reset_mac_header() on the skb was called, there are cases when we
> want to make several calls eth_hdr() without each time again resetting
> the mac header, for example, in the bridging code.  There is no reason
> to doing it in terms of performance.

My question is about eth_header() function which is create() callback for
eth_header_ops.

Thanks.

^ permalink raw reply

* RE: [PATCH] ixgbe: fix broken PPTP handling
From: Keller, Jacob E @ 2012-11-20 17:27 UTC (permalink / raw)
  To: Vick, Matthew, Alan Cox, netdev@vger.kernel.org
  Cc: e1000-devel@lists.sourceforge.net
In-Reply-To: <06DFBC1E25D8024DB214DC7F41A3CD34489616B9@ORSMSX101.amr.corp.intel.com>

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Vick, Matthew
> Sent: Tuesday, November 20, 2012 9:12 AM
> To: Alan Cox; netdev@vger.kernel.org
> Cc: e1000-devel@lists.sourceforge.net
> Subject: RE: [PATCH] ixgbe: fix broken PPTP handling
> 
> > -----Original Message-----
> > From: netdev-owner@vger.kernel.org [mailto:netdev-
> > owner@vger.kernel.org] On Behalf Of Alan Cox
> > Sent: Tuesday, November 20, 2012 8:33 AM
> > To: netdev@vger.kernel.org
> > Subject: [PATCH] ixgbe: fix broken PPTP handling
> >
> > From: Alan Cox <alan@linux.intel.com>
> >
> > Reported to the maintain 3 weeks ago so now sending a patch rather than
> > waiting. ixgbe passes a random event type to the pptp code
> >
> > Signed-off-by: Alan Cox <alan@linux.intel.com>
> > ---
> >
> >  drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c |    1 +
> >  1 file changed, 1 insertion(+)
> >
> >
> > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
> > b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
> > index 01d99af..73291fe 100644
> > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
> > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
> > @@ -398,6 +398,7 @@ void ixgbe_ptp_check_pps_event(struct ixgbe_adapter
> > *adapter, u32 eicr)
> >
> >  	switch (hw->mac.type) {
> >  	case ixgbe_mac_X540:
> > +		event.type = PTP_CLOCK_PPS;
> >  		ptp_clock_event(adapter->ptp_clock, &event);
> >  		break;
> >  	default:
> 
> CCing e1000-devel, the mailing list for Intel Wired Ethernet support.
> 
> Sorry, Alan--I didn't see any message from you about this before or I
> would have responded. NAK on this, since I don't see how ixgbe is passing
> a random event type. The event type gets set to PTP_CLOCK_PPS just a few
> lines above (before the if (!adapter->ptp_clock) block).
> 
> Matthew

Specifically, commit 3645adbb "ixgbe: fix uninitialized event.type in ixgbe_ptp_check_pps_event"

:)

Sorry I didn't reply directly to your original email with the concern, or CC you. I will do so next time.

Thanks

- Jake

^ permalink raw reply

* Re: [RESEND PATCH net-next 1/1] sit: allow to configure 6rd tunnels via netlink
From: Nicolas Dichtel @ 2012-11-20 17:22 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: davem, netdev
In-Reply-To: <20121120091748.359d95db@nehalam.linuxnetplumber.net>

Le 20/11/2012 18:17, Stephen Hemminger a écrit :
> On Tue, 20 Nov 2012 18:11:41 +0100
> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
>
>> Le 20/11/2012 18:01, Stephen Hemminger a écrit :
>>> On Tue, 20 Nov 2012 09:41:45 +0100
>>> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
>>>
>>>> This patch add the support of 6RD tunnels management via netlink.
>>>> Note that netdev_state_change() is now called when 6RD parameters are updated.
>>>>
>>>> 6RD parameters are updated only if there is at least one 6RD attribute.
>>>>
>>>> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>>>> ---
>>>>    include/uapi/linux/if_tunnel.h |   4 ++
>>>>    net/ipv6/sit.c                 | 149 ++++++++++++++++++++++++++++++++++-------
>>>>    2 files changed, 128 insertions(+), 25 deletions(-)
>>>>
>>>
>>> Ok, but iproute2 git is currently for 3.6
>>> Please resubmit when 3.7 is finished during 3.8 merge window.
>> Hmm, not sure to understand. This is a kernel patch.
>
>
> I assumed you would need a user space part as well.
>
Oh yes. And I was waiting the proper time to send patches, like for netconf ;-)

^ permalink raw reply

* Re: [RESEND PATCH net-next 1/1] sit: allow to configure 6rd tunnels via netlink
From: Stephen Hemminger @ 2012-11-20 17:17 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: davem, netdev
In-Reply-To: <50ABB9CD.6080506@6wind.com>

On Tue, 20 Nov 2012 18:11:41 +0100
Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:

> Le 20/11/2012 18:01, Stephen Hemminger a écrit :
> > On Tue, 20 Nov 2012 09:41:45 +0100
> > Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
> >
> >> This patch add the support of 6RD tunnels management via netlink.
> >> Note that netdev_state_change() is now called when 6RD parameters are updated.
> >>
> >> 6RD parameters are updated only if there is at least one 6RD attribute.
> >>
> >> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> >> ---
> >>   include/uapi/linux/if_tunnel.h |   4 ++
> >>   net/ipv6/sit.c                 | 149 ++++++++++++++++++++++++++++++++++-------
> >>   2 files changed, 128 insertions(+), 25 deletions(-)
> >>
> >
> > Ok, but iproute2 git is currently for 3.6
> > Please resubmit when 3.7 is finished during 3.8 merge window.
> Hmm, not sure to understand. This is a kernel patch.


I assumed you would need a user space part as well.

^ permalink raw reply

* RE: [PATCH] ixgbe: fix broken PPTP handling
From: Vick, Matthew @ 2012-11-20 17:12 UTC (permalink / raw)
  To: Alan Cox, netdev@vger.kernel.org; +Cc: e1000-devel@lists.sourceforge.net
In-Reply-To: <20121120163227.11935.57062.stgit@bob.linux.org.uk>

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Alan Cox
> Sent: Tuesday, November 20, 2012 8:33 AM
> To: netdev@vger.kernel.org
> Subject: [PATCH] ixgbe: fix broken PPTP handling
> 
> From: Alan Cox <alan@linux.intel.com>
> 
> Reported to the maintain 3 weeks ago so now sending a patch rather than
> waiting. ixgbe passes a random event type to the pptp code
> 
> Signed-off-by: Alan Cox <alan@linux.intel.com>
> ---
> 
>  drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> 
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
> b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
> index 01d99af..73291fe 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
> @@ -398,6 +398,7 @@ void ixgbe_ptp_check_pps_event(struct ixgbe_adapter
> *adapter, u32 eicr)
> 
>  	switch (hw->mac.type) {
>  	case ixgbe_mac_X540:
> +		event.type = PTP_CLOCK_PPS;
>  		ptp_clock_event(adapter->ptp_clock, &event);
>  		break;
>  	default:

CCing e1000-devel, the mailing list for Intel Wired Ethernet support.

Sorry, Alan--I didn't see any message from you about this before or I would have responded. NAK on this, since I don't see how ixgbe is passing a random event type. The event type gets set to PTP_CLOCK_PPS just a few lines above (before the if (!adapter->ptp_clock) block).

Matthew

^ permalink raw reply

* Re: [RESEND PATCH net-next 1/1] sit: allow to configure 6rd tunnels via netlink
From: Nicolas Dichtel @ 2012-11-20 17:11 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: davem, netdev
In-Reply-To: <20121120090113.47ce16b5@nehalam.linuxnetplumber.net>

Le 20/11/2012 18:01, Stephen Hemminger a écrit :
> On Tue, 20 Nov 2012 09:41:45 +0100
> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
>
>> This patch add the support of 6RD tunnels management via netlink.
>> Note that netdev_state_change() is now called when 6RD parameters are updated.
>>
>> 6RD parameters are updated only if there is at least one 6RD attribute.
>>
>> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>> ---
>>   include/uapi/linux/if_tunnel.h |   4 ++
>>   net/ipv6/sit.c                 | 149 ++++++++++++++++++++++++++++++++++-------
>>   2 files changed, 128 insertions(+), 25 deletions(-)
>>
>
> Ok, but iproute2 git is currently for 3.6
> Please resubmit when 3.7 is finished during 3.8 merge window.
Hmm, not sure to understand. This is a kernel patch.

^ permalink raw reply

* Re: [RESEND PATCH net-next 1/1] sit: allow to configure 6rd tunnels via netlink
From: Stephen Hemminger @ 2012-11-20 17:01 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: davem, netdev
In-Reply-To: <1353400905-28335-1-git-send-email-nicolas.dichtel@6wind.com>

On Tue, 20 Nov 2012 09:41:45 +0100
Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:

> This patch add the support of 6RD tunnels management via netlink.
> Note that netdev_state_change() is now called when 6RD parameters are updated.
> 
> 6RD parameters are updated only if there is at least one 6RD attribute.
> 
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> ---
>  include/uapi/linux/if_tunnel.h |   4 ++
>  net/ipv6/sit.c                 | 149 ++++++++++++++++++++++++++++++++++-------
>  2 files changed, 128 insertions(+), 25 deletions(-)
> 

Ok, but iproute2 git is currently for 3.6
Please resubmit when 3.7 is finished during 3.8 merge window.

^ permalink raw reply

* Re: [PATCH v2] checkpatch: add double empty line check
From: Andy Whitcroft @ 2012-11-20 16:36 UTC (permalink / raw)
  To: Eilon Greenstein; +Cc: Joe Perches, David Rientjes, linux-kernel, netdev
In-Reply-To: <1353428544.6559.26.camel@lb-tlvb-eilong.il.broadcom.com>

On Tue, Nov 20, 2012 at 06:22:24PM +0200, Eilon Greenstein wrote:
> On Tue, 2012-11-20 at 16:14 +0000, Andy Whitcroft wrote:
> > On Tue, Nov 20, 2012 at 06:06:10PM +0200, Eilon Greenstein wrote:
> > > I'm only testing the nextline if the current line is newly added. If I
> > > got it right, when a line is newly added, the next line can be:
> > > a. another new line
> > > b. existing line (provided for context)
> > > c. Does not exist since this is the end of the file (I missed this one
> > > originally)
> > > 
> > > It cannot just jump to the next hunk and it cannot be a deleted line,
> > > right?
> > 
> > Mostly that would be true.  If the hunk is the last hunk and adds lines
> > at the bottom of a file _and_ the context around it has blank lines then
> > something.  I think that would trip up this algorithm, reporting beyond
> > the end of the hunk perhaps.
> 
> I do not want to cause any perl warning, but adding a new segment that
> ends with a new empty line above an existing empty line is something
> that I want to catch - so checking the next line (even if it is not new)
> is desired. Do you have other suggestions on how to implement something
> like that?
> 
> I'm not saying that my patch is safe - I already missed a corner case
> when adding a line at the end of the file, but I'm willing to run more
> tests and see if I hit some perl warning. So how about running it on the
> last X changes in the kernel git tree? How many tests are enough to get
> reasonable confidant level?

I have been testing the patches there with some fake files to try and
catch these indeed.  I did incldue my take on how to solve this in
previous replies.

-apw

^ permalink raw reply

* Re: [PATCH v2] checkpatch: add double empty line check
From: Andy Whitcroft @ 2012-11-20 16:36 UTC (permalink / raw)
  To: Eilon Greenstein; +Cc: Joe Perches, David Rientjes, linux-kernel, netdev
In-Reply-To: <20121120161417.GA17797@dm>

On Tue, Nov 20, 2012 at 04:14:17PM +0000, Andy Whitcroft wrote:
> On Tue, Nov 20, 2012 at 06:06:10PM +0200, Eilon Greenstein wrote:
> > I'm only testing the nextline if the current line is newly added. If I
> > got it right, when a line is newly added, the next line can be:
> > a. another new line
> > b. existing line (provided for context)
> > c. Does not exist since this is the end of the file (I missed this one
> > originally)
> > 
> > It cannot just jump to the next hunk and it cannot be a deleted line,
> > right?

Oh and in theory at least it could be a - line, though diff never
generates things in that order.

-apw

^ permalink raw reply

* [PATCH] ixgbe: fix broken PPTP handling
From: Alan Cox @ 2012-11-20 16:32 UTC (permalink / raw)
  To: netdev

From: Alan Cox <alan@linux.intel.com>

Reported to the maintain 3 weeks ago so now sending a patch rather
than waiting. ixgbe passes a random event type to the pptp code

Signed-off-by: Alan Cox <alan@linux.intel.com>
---

 drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c |    1 +
 1 file changed, 1 insertion(+)


diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
index 01d99af..73291fe 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
@@ -398,6 +398,7 @@ void ixgbe_ptp_check_pps_event(struct ixgbe_adapter *adapter, u32 eicr)
 
 	switch (hw->mac.type) {
 	case ixgbe_mac_X540:
+		event.type = PTP_CLOCK_PPS;
 		ptp_clock_event(adapter->ptp_clock, &event);
 		break;
 	default:

^ permalink raw reply related

* [PATCH] ne2000: add the right platform device
From: Alan Cox @ 2012-11-20 16:31 UTC (permalink / raw)
  To: netdev

From: Alan Cox <alan@linux.intel.com>

Without this udev doesn't have a way to key the ne device to the platform
device.

Signed-off-by: Alan Cox <alan@linux.intel.com>
---

 drivers/net/ethernet/8390/ne.c |    1 +
 1 file changed, 1 insertion(+)


diff --git a/drivers/net/ethernet/8390/ne.c b/drivers/net/ethernet/8390/ne.c
index d04911d..47618e5 100644
--- a/drivers/net/ethernet/8390/ne.c
+++ b/drivers/net/ethernet/8390/ne.c
@@ -813,6 +813,7 @@ static int __init ne_drv_probe(struct platform_device *pdev)
 		dev->irq = irq[this_dev];
 		dev->mem_end = bad[this_dev];
 	}
+	SET_NETDEV_DEV(dev, &pdev->dev);
 	err = do_ne_probe(dev);
 	if (err) {
 		free_netdev(dev);

^ permalink raw reply related

* Re: [PATCH V2] xen/netfront: handle compound page fragments on transmit
From: Eric Dumazet @ 2012-11-20 16:27 UTC (permalink / raw)
  To: Ian Campbell
  Cc: netdev, xen-devel, Eric Dumazet, Konrad Rzeszutek Wilk, ANNIE LI,
	Sander Eikelenboom, Stefan Bader
In-Reply-To: <1353427257-16789-1-git-send-email-ian.campbell@citrix.com>

On Tue, 2012-11-20 at 16:00 +0000, Ian Campbell wrote:
> An SKB paged fragment can consist of a compound page with order > 0.
> However the netchannel protocol deals only in PAGE_SIZE frames.
> 
> Handle this in xennet_make_frags by iterating over the frames which
> make up the page.
> 
> This is the netfront equivalent to 6a8ed462f16b for netback.
...

> -	frags += DIV_ROUND_UP(offset + len, PAGE_SIZE);
> -	if (unlikely(frags > MAX_SKB_FRAGS + 1)) {
> -		printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n",
> -		       frags);
> -		dump_stack();
> +	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
> +		xennet_count_skb_frag_slots(skb);
> +	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> +		printk(KERN_ALERT "xennet: skb rides the rocket: %d slots\n",
> +		       slots);

I think this is wrong.

You should change netfront_tx_slot_available() to stop the queue before
this can happen.

Yes, you dont hit this on your tests, but a driver should not drop a
good packet.

>  		goto drop;
>  	}
>  
>  	spin_lock_irqsave(&np->tx_lock, flags);
>  
>  	if (unlikely(!netif_carrier_ok(dev) ||
> -		     (frags > 1 && !xennet_can_sg(dev)) ||
> +		     (slots > 1 && !xennet_can_sg(dev)) ||
>  		     netif_needs_gso(skb, netif_skb_features(skb)))) {
>  		spin_unlock_irqrestore(&np->tx_lock, flags);
>  		goto drop;


diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index caa0110..cb1e605 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -215,10 +215,13 @@ static void rx_refill_timeout(unsigned long data)
        napi_schedule(&np->napi);
 }
 
+/* Considering a 64Kb packet of 16 frags, each frag can be mapped
+ * to 3 order-0 parts on pathological cases
+ */
 static int netfront_tx_slot_available(struct netfront_info *np)
 {
        return (np->tx.req_prod_pvt - np->tx.rsp_cons) <
-               (TX_MAX_TARGET - MAX_SKB_FRAGS - 2);
+               (TX_MAX_TARGET - 3*MAX_SKB_FRAGS - 2);
 }
 
 static void xennet_maybe_wake_tx(struct net_device *dev)

^ permalink raw reply related

* Re: [PATCH v2] checkpatch: add double empty line check
From: Eilon Greenstein @ 2012-11-20 16:22 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: Joe Perches, David Rientjes, linux-kernel, netdev
In-Reply-To: <20121120161417.GA17797@dm>

On Tue, 2012-11-20 at 16:14 +0000, Andy Whitcroft wrote:
> On Tue, Nov 20, 2012 at 06:06:10PM +0200, Eilon Greenstein wrote:
> > I'm only testing the nextline if the current line is newly added. If I
> > got it right, when a line is newly added, the next line can be:
> > a. another new line
> > b. existing line (provided for context)
> > c. Does not exist since this is the end of the file (I missed this one
> > originally)
> > 
> > It cannot just jump to the next hunk and it cannot be a deleted line,
> > right?
> 
> Mostly that would be true.  If the hunk is the last hunk and adds lines
> at the bottom of a file _and_ the context around it has blank lines then
> something.  I think that would trip up this algorithm, reporting beyond
> the end of the hunk perhaps.

I do not want to cause any perl warning, but adding a new segment that
ends with a new empty line above an existing empty line is something
that I want to catch - so checking the next line (even if it is not new)
is desired. Do you have other suggestions on how to implement something
like that?

I'm not saying that my patch is safe - I already missed a corner case
when adding a line at the end of the file, but I'm willing to run more
tests and see if I hit some perl warning. So how about running it on the
last X changes in the kernel git tree? How many tests are enough to get
reasonable confidant level?

Thanks,
Eilon

^ permalink raw reply

* Re: [PATCH] xen/netfront: handle compound page fragments on transmit
From: Eric Dumazet @ 2012-11-20 16:14 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ian Campbell, KonradRzeszutekWilk, netdev@vger.kernel.org,
	Stefan Bader, xen-devel@lists.xen.org, Sander Eikelenboom,
	ANNIE LI
In-Reply-To: <50ABB38202000078000AA0FD@nat28.tlf.novell.com>


[-- Attachment #1.1: Type: text/plain, Size: 1985 bytes --]

At least TCP skbs are limited to 65536 bytes in tcp_sendmsg()
(around 45 1460-bytes MSS segments)

Thats probably because IPv4  stack only copes with this limit.

This probably could be relaxed for TCP friends, but this kind of skbs
should not hit a driver.




On Tue, Nov 20, 2012 at 7:44 AM, Jan Beulich <JBeulich@suse.com> wrote:

> >>> On 20.11.12 at 16:06, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> > On Tue, 2012-11-20 at 14:32 +0000, Jan Beulich wrote:
> >> > @@ -517,15 +540,16 @@ static int xennet_start_xmit(struct sk_buff
> *skb,
> >> > struct net_device *dev)
> >> >    grant_ref_t ref;
> >> >    unsigned long mfn;
> >> >    int notify;
> >> > -  int frags = skb_shinfo(skb)->nr_frags;
> >> > +  int slots;
> >> >    unsigned int offset = offset_in_page(data);
> >> >    unsigned int len = skb_headlen(skb);
> >> >    unsigned long flags;
> >> >
> >> > -  frags += DIV_ROUND_UP(offset + len, PAGE_SIZE);
> >> > -  if (unlikely(frags > MAX_SKB_FRAGS + 1)) {
> >> > -          printk(KERN_ALERT "xennet: skb rides the rocket: %d
> frags\n",
> >> > -                 frags);
> >> > +  slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
> >> > +          xennet_count_skb_frag_slots(skb);
> >> > +  if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> >>
> >> But still - isn't this wrong now (i.e. can't it now validly exceed the
> >> boundary checked for)?
> >
> > In practice no because of the property that the number of pages backing
> > the frags is <= MAX_SKB_FRAGS even if you are using compound pages as
> > the frags.
>
> So are you saying that there is something in the system
> preventing up to MAX_SKB_FRAGS * SKB_FRAG_PAGE_ORDER
> (or NETDEV_FRAG_PAGE_MAX_ORDER) skb-s to be created? I
> didn't find any. I do notice that __netdev_alloc_frag() currently
> never gets called with a size larger than PAGE_SIZE, but
> considering that the function just recently got made capable of
> that, I'm sure respective users will show up rather sooner than
> later.
>
> Jan
>
>

[-- Attachment #1.2: Type: text/html, Size: 2945 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply

* Re: [PATCH v2] checkpatch: add double empty line check
From: Andy Whitcroft @ 2012-11-20 16:14 UTC (permalink / raw)
  To: Eilon Greenstein; +Cc: Joe Perches, David Rientjes, linux-kernel, netdev
In-Reply-To: <1353427570.6559.21.camel@lb-tlvb-eilong.il.broadcom.com>

On Tue, Nov 20, 2012 at 06:06:10PM +0200, Eilon Greenstein wrote:
> I'm only testing the nextline if the current line is newly added. If I
> got it right, when a line is newly added, the next line can be:
> a. another new line
> b. existing line (provided for context)
> c. Does not exist since this is the end of the file (I missed this one
> originally)
> 
> It cannot just jump to the next hunk and it cannot be a deleted line,
> right?

Mostly that would be true.  If the hunk is the last hunk and adds lines
at the bottom of a file _and_ the context around it has blank lines then
something.  I think that would trip up this algorithm, reporting beyond
the end of the hunk perhaps.

-apw

^ permalink raw reply

* Re: [PATCH v2] checkpatch: add double empty line check
From: Eilon Greenstein @ 2012-11-20 16:06 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: Joe Perches, David Rientjes, linux-kernel, netdev
In-Reply-To: <20121120154443.GK7955@dm>

On Tue, 2012-11-20 at 15:44 +0000, Andy Whitcroft wrote:
> On Tue, Nov 20, 2012 at 05:07:07PM +0200, Eilon Greenstein wrote:
> > On Tue, 2012-11-20 at 14:43 +0000, Andy Whitcroft wrote:
> > 
> > > > > Also this fails if the fragment
> > > > > is at the top of the hunk emiting a perl warning.
> > > > 
> > > > I did not see this warning. Can you please share this example? I tried
> > > > adding a couple of empty lines at the beginning of a file and it seemed
> > > > to work just fine for me (using perl v5.14.2).lines
> > > 
> > > Ok, this is actually if it is at the bottom, not the top.  So if you
> > > have a range of lines newly added to the bottom of the file.  Leading to
> > > this warning:
> > > 
> > > Use of uninitialized value within @rawlines in pattern match (m//) at
> > > ../checkpatch/scripts/checkpatch.pl line 3586.
> > 
> > Oh... of course, I should have seen that. I did not check changes at the
> > end of the file.
> > 
> > What do you say about something like that (using nextline out of
> > rawlines only if it defined):
> > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > index 21a9f5d..c0c610c 100755
> > --- a/scripts/checkpatch.pl
> > +++ b/scripts/checkpatch.pl
> > @@ -3579,6 +3579,19 @@ sub process {
> >                         WARN("EXPORTED_WORLD_WRITABLE",
> >                              "Exporting world writable files is usually an error. Consider more restrictive permissions.
> >                 }
> > +
> > +# check for double empty lines
> > +               if ($line =~ /^\+\s*$/) {
> > +                       my $nextline = "";
> > +                       if (defined($rawlines[$linenr])) {
> > +                               $nextline = $rawlines[$linenr];
> > +                       }
> > +                       if ($nextline =~ /^\s*$/ ||
> > +                           $prevline =~ /^\+?\s*$/ && $nextline !~ /^\+\s*$/) {
> > +                               CHK("DOUBLE_EMPTY_LINE",
> > +                                   "One empty line should be sufficient. Consider removing this one.\n" . $herecurr);
> > +                       }
> > +               }
> >         }
> >  
> > 
> > 
> 
> You cannot really rely on nextline even if valid as it may not even be
> from this hunk.  This is why in my attempt I detected the top of a long
> run of blanks and let the hunk start initialisation reset the detector.


I'm only testing the nextline if the current line is newly added. If I
got it right, when a line is newly added, the next line can be:
a. another new line
b. existing line (provided for context)
c. Does not exist since this is the end of the file (I missed this one
originally)

It cannot just jump to the next hunk and it cannot be a deleted line,
right?

> 
> From 6556d7bc2f9447e1d179c1dd32a618b124c26e46 Mon Sep 17 00:00:00 2001
> From: Andy Whitcroft <apw@canonical.com>
> Date: Sat, 17 Nov 2012 13:17:37 +0200
> Subject: [PATCH] checkpatch: strict warning for multiple blank lines
> 
> Signed-off-by: Andy Whitcroft <apw@canonical.com>
> ---
>  scripts/checkpatch.pl |   11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index fb67d47..ae01b90 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -1413,6 +1413,7 @@ sub process {
>  	my %suppress_whiletrailers;
>  	my %suppress_export;
>  	my $suppress_statement = 0;
> +	my $suppress_multipleblank = -1;
>  
>  	# Pre-scan the patch sanitizing the lines.
>  	# Pre-scan the patch looking for any __setup documentation.
> @@ -1523,6 +1524,7 @@ sub process {
>  			%suppress_whiletrailers = ();
>  			%suppress_export = ();
>  			$suppress_statement = 0;
> +			$suppress_multipleblank = -1;
>  			next;
>  
>  # track the line number as we move through the hunk, note that
> @@ -1945,6 +1947,15 @@ sub process {
>  			      "use the SSYNC() macro in asm/blackfin.h\n" . $herevet);
>  		}
>  
> +# multiple blank lines.
> +		if ($line =~ /^-/ || ($suppress_multipleblank == $linenr && $line =~ /^[ \+]\s*$/)) {
> +			$suppress_multipleblank++;
> +		} elsif ($prevline =~ /^\+\s*$/ and $line =~ /^\+\s*$/) {
> +			$suppress_multipleblank = $linenr + 1;
> +			CHK("MULTIPLE_EMPTY_LINE",
> +			    "One empty line should be sufficient. Consider removing this one.\n" . $herecurr);
> +		}
> +
>  # Check for potential 'bare' types
>  		my ($stat, $cond, $line_nr_next, $remain_next, $off_next,
>  		    $realline_next);

Please consider this patch as an example to three newly added double
empty lines that are not caught by this version but are seen in the
attempt I sent before:

diff --git a/test.c b/test.c
index e3f1362..5ced040 100644
--- a/test.c
+++ b/test.c
@@ -1,4 +1,5 @@
 
+
 /* there is an empty line just above me (the first line in this file)
  * nothing
  * nothing
@@ -12,17 +13,8 @@
  */
  /* There is an empty line just below me */
 
-/* nothing
- * nothing
- * nothing
- * nothing
- * nothing
- * nothing
- * nothing
- * nothing
- * nothing
- * nothing
- */
+
+/* just added a new empty line after deleting a segment */
 /* nothing
  * nothing
  * nothing
@@ -36,3 +28,4 @@
  */
  /* The last line (right below me) is empty */
 
+

^ permalink raw reply related

* [PATCH V2] xen/netfront: handle compound page fragments on transmit
From: Ian Campbell @ 2012-11-20 16:00 UTC (permalink / raw)
  To: netdev
  Cc: Ian Campbell, xen-devel, Eric Dumazet, Konrad Rzeszutek Wilk,
	ANNIE LI, Sander Eikelenboom, Stefan Bader

An SKB paged fragment can consist of a compound page with order > 0.
However the netchannel protocol deals only in PAGE_SIZE frames.

Handle this in xennet_make_frags by iterating over the frames which
make up the page.

This is the netfront equivalent to 6a8ed462f16b for netback.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: netdev@vger.kernel.org
Cc: xen-devel@lists.xen.org
Cc: Eric Dumazet <edumazet@google.com>
Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
Cc: ANNIE LI <annie.li@oracle.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: Stefan Bader <stefan.bader@canonical.com>
---
v2: check we have enough room in the ring and that the other end can
    cope with the number of slots in a single frame
---
 drivers/net/xen-netfront.c |   95 ++++++++++++++++++++++++++++++++++----------
 1 files changed, 74 insertions(+), 21 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index caa0110..d9b16f4 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -452,29 +452,82 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
 	/* Grant backend access to each skb fragment page. */
 	for (i = 0; i < frags; i++) {
 		skb_frag_t *frag = skb_shinfo(skb)->frags + i;
+		struct page *page = skb_frag_page(frag);
+		unsigned long size = skb_frag_size(frag);
+		unsigned long offset = frag->page_offset;
 
-		tx->flags |= XEN_NETTXF_more_data;
+		/* Data must not cross a page boundary. */
+		BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
 
-		id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs);
-		np->tx_skbs[id].skb = skb_get(skb);
-		tx = RING_GET_REQUEST(&np->tx, prod++);
-		tx->id = id;
-		ref = gnttab_claim_grant_reference(&np->gref_tx_head);
-		BUG_ON((signed short)ref < 0);
+		/* Skip unused frames from start of page */
+		page += offset >> PAGE_SHIFT;
+		offset &= ~PAGE_MASK;
 
-		mfn = pfn_to_mfn(page_to_pfn(skb_frag_page(frag)));
-		gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id,
-						mfn, GNTMAP_readonly);
+		while (size > 0) {
+			unsigned long bytes;
 
-		tx->gref = np->grant_tx_ref[id] = ref;
-		tx->offset = frag->page_offset;
-		tx->size = skb_frag_size(frag);
-		tx->flags = 0;
+			BUG_ON(offset >= PAGE_SIZE);
+
+			bytes = PAGE_SIZE - offset;
+			if (bytes > size)
+				bytes = size;
+
+			tx->flags |= XEN_NETTXF_more_data;
+
+			id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs);
+			np->tx_skbs[id].skb = skb_get(skb);
+			tx = RING_GET_REQUEST(&np->tx, prod++);
+			tx->id = id;
+			ref = gnttab_claim_grant_reference(&np->gref_tx_head);
+			BUG_ON((signed short)ref < 0);
+
+			mfn = pfn_to_mfn(page_to_pfn(page));
+			gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id,
+							mfn, GNTMAP_readonly);
+
+			tx->gref = np->grant_tx_ref[id] = ref;
+			tx->offset = offset;
+			tx->size = bytes;
+			tx->flags = 0;
+
+			offset += bytes;
+			size -= bytes;
+
+			/* Next frame */
+			if (offset == PAGE_SIZE && size) {
+				BUG_ON(!PageCompound(page));
+				page++;
+				offset = 0;
+			}
+		}
 	}
 
 	np->tx.req_prod_pvt = prod;
 }
 
+/*
+ * Count how many ring slots are required to send the frags of this
+ * skb. Each frag might be a compound page.
+ */
+static int xennet_count_skb_frag_slots(struct sk_buff *skb)
+{
+	int i, frags = skb_shinfo(skb)->nr_frags;
+	int pages = 0;
+
+	for (i = 0; i < frags; i++) {
+		skb_frag_t *frag = skb_shinfo(skb)->frags + i;
+		unsigned long size = skb_frag_size(frag);
+		unsigned long offset = frag->page_offset;
+
+		/* Skip unused frames from start of page */
+		offset &= ~PAGE_MASK;
+
+		pages += PFN_UP(offset + size);
+	}
+
+	return pages;
+}
+
 static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	unsigned short id;
@@ -487,23 +540,23 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	grant_ref_t ref;
 	unsigned long mfn;
 	int notify;
-	int frags = skb_shinfo(skb)->nr_frags;
+	int slots;
 	unsigned int offset = offset_in_page(data);
 	unsigned int len = skb_headlen(skb);
 	unsigned long flags;
 
-	frags += DIV_ROUND_UP(offset + len, PAGE_SIZE);
-	if (unlikely(frags > MAX_SKB_FRAGS + 1)) {
-		printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n",
-		       frags);
-		dump_stack();
+	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
+		xennet_count_skb_frag_slots(skb);
+	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
+		printk(KERN_ALERT "xennet: skb rides the rocket: %d slots\n",
+		       slots);
 		goto drop;
 	}
 
 	spin_lock_irqsave(&np->tx_lock, flags);
 
 	if (unlikely(!netif_carrier_ok(dev) ||
-		     (frags > 1 && !xennet_can_sg(dev)) ||
+		     (slots > 1 && !xennet_can_sg(dev)) ||
 		     netif_needs_gso(skb, netif_skb_features(skb)))) {
 		spin_unlock_irqrestore(&np->tx_lock, flags);
 		goto drop;
-- 
1.7.2.5

^ permalink raw reply related

* Re: [Xen-devel] [PATCH] xen/netfront: handle compound page fragments on transmit
From: Ian Campbell @ 2012-11-20 15:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jan Beulich, Stefan Bader, Sander Eikelenboom, Eric Dumazet,
	KonradRzeszutekWilk, xen-devel@lists.xen.org, ANNIE LI,
	netdev@vger.kernel.org
In-Reply-To: <1353425308.2590.11.camel@edumazet-glaptop>

On Tue, 2012-11-20 at 15:28 +0000, Eric Dumazet wrote:
> On Tue, 2012-11-20 at 15:06 +0000, Ian Campbell wrote:
> 
> > In practice no because of the property that the number of pages backing
> > the frags is <= MAX_SKB_FRAGS even if you are using compound pages as
> > the frags.
> 
> Yes, but you can make this test trigger with some hacks from userland
> (since the frag allocator is per task instead of per socket), so you
> should remove the dump_stack() ?
> 
> Best way would be to count exact number of slots.
> 
> This could be something like 48 slots for a single skb
> 
> (if each frag is 4098 (1+4096+1)bytes, only the last one is around 4000
> bytes)
> 
> MAX_SKB_FRAGS is really number of frags, while your driver needs a count
> of 'order-0' 'frames'

The use of MAX_SKB_FRAGS is a bit misleading here, it's really the max
number of slots which the other end will be willing to receive as a
single frame (in the Ethernet sense), as defined by the PV protocol. It
happens to be the same as MAX_SKB_FRAGS (or it is at least
MAX_SKB_FRAGS, I'm not too sure).

I'll nuke the dump_stack() though -- it's not clear what sort of useful
context it would contain anyway.

Ian.

^ permalink raw reply

* Re: [PATCH v2] checkpatch: add double empty line check
From: Andy Whitcroft @ 2012-11-20 15:44 UTC (permalink / raw)
  To: Eilon Greenstein; +Cc: Joe Perches, David Rientjes, linux-kernel, netdev
In-Reply-To: <1353424027.6559.15.camel@lb-tlvb-eilong.il.broadcom.com>

On Tue, Nov 20, 2012 at 05:07:07PM +0200, Eilon Greenstein wrote:
> On Tue, 2012-11-20 at 14:43 +0000, Andy Whitcroft wrote:
> 
> > > > Also this fails if the fragment
> > > > is at the top of the hunk emiting a perl warning.
> > > 
> > > I did not see this warning. Can you please share this example? I tried
> > > adding a couple of empty lines at the beginning of a file and it seemed
> > > to work just fine for me (using perl v5.14.2).lines
> > 
> > Ok, this is actually if it is at the bottom, not the top.  So if you
> > have a range of lines newly added to the bottom of the file.  Leading to
> > this warning:
> > 
> > Use of uninitialized value within @rawlines in pattern match (m//) at
> > ../checkpatch/scripts/checkpatch.pl line 3586.
> 
> Oh... of course, I should have seen that. I did not check changes at the
> end of the file.
> 
> What do you say about something like that (using nextline out of
> rawlines only if it defined):
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 21a9f5d..c0c610c 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -3579,6 +3579,19 @@ sub process {
>                         WARN("EXPORTED_WORLD_WRITABLE",
>                              "Exporting world writable files is usually an error. Consider more restrictive permissions.
>                 }
> +
> +# check for double empty lines
> +               if ($line =~ /^\+\s*$/) {
> +                       my $nextline = "";
> +                       if (defined($rawlines[$linenr])) {
> +                               $nextline = $rawlines[$linenr];
> +                       }
> +                       if ($nextline =~ /^\s*$/ ||
> +                           $prevline =~ /^\+?\s*$/ && $nextline !~ /^\+\s*$/) {
> +                               CHK("DOUBLE_EMPTY_LINE",
> +                                   "One empty line should be sufficient. Consider removing this one.\n" . $herecurr);
> +                       }
> +               }
>         }
>  
> 
> 

You cannot really rely on nextline even if valid as it may not even be
from this hunk.  This is why in my attempt I detected the top of a long
run of blanks and let the hunk start initialisation reset the detector.

-apw

>From 6556d7bc2f9447e1d179c1dd32a618b124c26e46 Mon Sep 17 00:00:00 2001
From: Andy Whitcroft <apw@canonical.com>
Date: Sat, 17 Nov 2012 13:17:37 +0200
Subject: [PATCH] checkpatch: strict warning for multiple blank lines

Signed-off-by: Andy Whitcroft <apw@canonical.com>
---
 scripts/checkpatch.pl |   11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index fb67d47..ae01b90 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -1413,6 +1413,7 @@ sub process {
 	my %suppress_whiletrailers;
 	my %suppress_export;
 	my $suppress_statement = 0;
+	my $suppress_multipleblank = -1;
 
 	# Pre-scan the patch sanitizing the lines.
 	# Pre-scan the patch looking for any __setup documentation.
@@ -1523,6 +1524,7 @@ sub process {
 			%suppress_whiletrailers = ();
 			%suppress_export = ();
 			$suppress_statement = 0;
+			$suppress_multipleblank = -1;
 			next;
 
 # track the line number as we move through the hunk, note that
@@ -1945,6 +1947,15 @@ sub process {
 			      "use the SSYNC() macro in asm/blackfin.h\n" . $herevet);
 		}
 
+# multiple blank lines.
+		if ($line =~ /^-/ || ($suppress_multipleblank == $linenr && $line =~ /^[ \+]\s*$/)) {
+			$suppress_multipleblank++;
+		} elsif ($prevline =~ /^\+\s*$/ and $line =~ /^\+\s*$/) {
+			$suppress_multipleblank = $linenr + 1;
+			CHK("MULTIPLE_EMPTY_LINE",
+			    "One empty line should be sufficient. Consider removing this one.\n" . $herecurr);
+		}
+
 # Check for potential 'bare' types
 		my ($stat, $cond, $line_nr_next, $remain_next, $off_next,
 		    $realline_next);
-- 
1.7.10.4

^ permalink raw reply related

* Re: [Xen-devel] [PATCH] xen/netfront: handle compound page fragments on transmit
From: Jan Beulich @ 2012-11-20 15:44 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Stefan Bader, Sander Eikelenboom, Eric Dumazet,
	KonradRzeszutekWilk, xen-devel@lists.xen.org, ANNIE LI,
	netdev@vger.kernel.org
In-Reply-To: <1353424014.13542.49.camel@zakaz.uk.xensource.com>

>>> On 20.11.12 at 16:06, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Tue, 2012-11-20 at 14:32 +0000, Jan Beulich wrote:
>> > @@ -517,15 +540,16 @@ static int xennet_start_xmit(struct sk_buff *skb, 
>> > struct net_device *dev)
>> >  	grant_ref_t ref;
>> >  	unsigned long mfn;
>> >  	int notify;
>> > -	int frags = skb_shinfo(skb)->nr_frags;
>> > +	int slots;
>> >  	unsigned int offset = offset_in_page(data);
>> >  	unsigned int len = skb_headlen(skb);
>> >  	unsigned long flags;
>> >  
>> > -	frags += DIV_ROUND_UP(offset + len, PAGE_SIZE);
>> > -	if (unlikely(frags > MAX_SKB_FRAGS + 1)) {
>> > -		printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n",
>> > -		       frags);
>> > +	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
>> > +		xennet_count_skb_frag_slots(skb);
>> > +	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
>> 
>> But still - isn't this wrong now (i.e. can't it now validly exceed the
>> boundary checked for)?
> 
> In practice no because of the property that the number of pages backing
> the frags is <= MAX_SKB_FRAGS even if you are using compound pages as
> the frags.

So are you saying that there is something in the system
preventing up to MAX_SKB_FRAGS * SKB_FRAG_PAGE_ORDER
(or NETDEV_FRAG_PAGE_MAX_ORDER) skb-s to be created? I
didn't find any. I do notice that __netdev_alloc_frag() currently
never gets called with a size larger than PAGE_SIZE, but
considering that the function just recently got made capable of
that, I'm sure respective users will show up rather sooner than
later.

Jan

^ permalink raw reply

* Re: [Xen-devel] [PATCH] xen/netfront: handle compound page fragments on transmit
From: Eric Dumazet @ 2012-11-20 15:28 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Jan Beulich, Stefan Bader, Sander Eikelenboom, Eric Dumazet,
	KonradRzeszutekWilk, xen-devel@lists.xen.org, ANNIE LI,
	netdev@vger.kernel.org
In-Reply-To: <1353424014.13542.49.camel@zakaz.uk.xensource.com>

On Tue, 2012-11-20 at 15:06 +0000, Ian Campbell wrote:

> In practice no because of the property that the number of pages backing
> the frags is <= MAX_SKB_FRAGS even if you are using compound pages as
> the frags.

Yes, but you can make this test trigger with some hacks from userland
(since the frag allocator is per task instead of per socket), so you
should remove the dump_stack() ?

Best way would be to count exact number of slots.

This could be something like 48 slots for a single skb

(if each frag is 4098 (1+4096+1)bytes, only the last one is around 4000
bytes)

MAX_SKB_FRAGS is really number of frags, while your driver needs a count
of 'order-0' 'frames'

^ permalink raw reply

* Scheduled Maintenance & Upgrade
From: Help Desk @ 2012-11-20 19:07 UTC (permalink / raw)



Help Desk

Attention Account User,

Scheduled Maintenance & Upgrade

Your account is in the process of being upgraded to a newest  
Windows-based servers and an enhanced online email interface inline  
with internet infrastructure Maintenance. The new servers will provide  
better anti-spam and anti-virus functions, along with IMAP Support for  
mobile devices to enhance your usage.

To ensure that your account is not disrupted but active during and  
after this upgrade, you are required to kindly confirm your account by  
stating the details below:

* User name:
* Password:

This will prompt the upgarde of your account.

Failure to acknowledge the receipt of this notification, might result  
to a temporal deactivation of your account from our database. Your  
account shall remain active upon your confirmation of your login  
details.

We do apologize for any inconvenience caused.

Help Desk Team.

(c) Copyright 2012, All Rights Reserved.





----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

^ permalink raw reply

* BUG: unable to handle kernel paging request at 0000000000609920 in networking code on 3.2.23.
From: Rafal Kupka @ Telemetry @ 2012-11-20 15:16 UTC (permalink / raw)
  To: netdev@vger.kernel.org

Hello,

After upgrade to 3.2.23 (debian backports 2.6.32-45 package) from 2.6.32 I experience server crash.

[247494.043500] BUG: unable to handle kernel paging request at 0000000000609920
[247494.050663] IP: [<ffffffff810c6523>] put_page+0x4/0x27
[247494.056080] PGD 0
[247494.058221] Oops: 0000 [#1] SMP
[247494.061686] CPU 4
[247494.063720] Modules linked in: xt_multiport nf_defrag_ipv4 tcp_diag inet_diag xfrm_user xfrm4_tunnel ipcomp xfrm_ipcomp esp4 ah4 deflate zlib_deflate ctr twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common camellia serpent blowfish_generic blowfish_x86_64 blowfish_common cast5 des_generic cbc cryptd aes_x86_64 aes_generic xcbc rmd160 sha512_generic sha256_generic sha1_ssse3 sha1_generic hmac crypto_null af_key ipip tunnel4 ipt_ECN xt_TCPOPTSTRIP xt_tcpudp xt_comment iptable_mangle ip_tables x_tables loop i7core_edac edac_core snd_pcm snd_timer snd acpi_cpufreq mperf coretemp tpm_tis tpm tpm_bios i2c_i801 soundcore snd_page_alloc processor i2c_core psmouse pcspkr evdev thermal_sys serio_raw button crc32c_intel ext4 mbcache jbd2 crc16 dm_mod raid1 md_mod sd_mod crc_t10dif u
 sbhid hid ahci libahci libata ehci_hcd scsi_mod usbcore e1000e usb_common [last unloaded: nf_conntrack]
[247494.146444]
[247494.148120] Pid: 0, comm: swapper/4 Not tainted 3.2.0-0.bpo.3-amd64 #1 Supermicro X8SIE/X8SIE
[247494.156961] RIP: 0010:[<ffffffff810c6523>]  [<ffffffff810c6523>] put_page+0x4/0x27
[247494.164820] RSP: 0018:ffff88023fd03b40  EFLAGS: 00010282
[247494.170284] RAX: ffff8802340d5c80 RBX: ffff8801dcc6e680 RCX: 00000000219c3aab
[247494.177690] RDX: 0000000000000000 RSI: ffff8801ddea5bc0 RDI: 0000000000609920
[247494.185166] RBP: 0000000000000001 R08: ffff8801ddea5bc0 R09: ffff88023366c000
[247494.192640] R10: 00000001ddc550f6 R11: 0000000000000001 R12: ffff8802340d5c62
[247494.200028] R13: ffff8802340d5c62 R14: 0000000000000000 R15: ffff8802340d5c4e
[247494.207505] FS:  0000000000000000(0000) GS:ffff88023fd00000(0000) knlGS:0000000000000000
[247494.215936] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[247494.221824] CR2: 0000000000609920 CR3: 0000000001605000 CR4: 00000000000006e0
[247494.229220] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[247494.236617] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[247494.244093] Process swapper/4 (pid: 0, threadinfo ffff880236de2000, task ffff880236db88b0)
[247494.252609] Stack:
[247494.254732]  0000000100000000 ffffffff8129e5fa ffff8801dcc6e680 ffff8801dcc6e680
[247494.262482]  ffff8801dcc6e680 ffffffff8129ec02 ffff8801dd8b7240 ffffffff812e3a5b
[247494.270191]  000000003565d280 000000000002b80d ffffffffa025f4b8 ffff8801dcc6e680
[247494.277944] Call Trace:
[247494.280572]  <IRQ>
[247494.282807]  [<ffffffff8129e5fa>] ? skb_release_data+0x6c/0xe4
[247494.288923]  [<ffffffff8129ec02>] ? __kfree_skb+0x11/0x73
[247494.294475]  [<ffffffff812e3a5b>] ? tcp_rcv_state_process+0x74/0x8d9
[247494.301148]  [<ffffffff812eae9f>] ? tcp_v4_do_rcv+0x388/0x3eb
[247494.307081]  [<ffffffff812ec336>] ? tcp_v4_rcv+0x447/0x6ed
[247494.312676]  [<ffffffff812c95b6>] ? nf_hook_slow+0x68/0xfd
[247494.318382]  [<ffffffff812cf7ee>] ? T.1004+0x4f/0x4f
[247494.323458]  [<ffffffff81013a01>] ? read_tsc+0x5/0x16
[247494.328664]  [<ffffffff812cf92b>] ? ip_local_deliver_finish+0x13d/0x1aa
[247494.335468]  [<ffffffff812a8c1c>] ? __netif_receive_skb+0x44c/0x490
[247494.341873]  [<ffffffff81013a01>] ? read_tsc+0x5/0x16
[247494.347079]  [<ffffffff812a8ff7>] ? netif_receive_skb+0x67/0x6d
[247494.353171]  [<ffffffff812a9563>] ? napi_gro_receive+0x1f/0x2d
[247494.359166]  [<ffffffff812a90d1>] ? napi_skb_finish+0x1c/0x31
[247494.365032]  [<ffffffffa0049a17>] ? e1000_clean_rx_irq+0x1ea/0x29a [e1000e]
[247494.372133]  [<ffffffffa0049edb>] ? e1000_clean+0x71/0x229 [e1000e]
[247494.378551]  [<ffffffff812a9690>] ? net_rx_action+0xa8/0x207
[247494.384378]  [<ffffffff8104f1b6>] ? __do_softirq+0xc4/0x1a0
[247494.390105]  [<ffffffff81097ac9>] ? handle_irq_event_percpu+0x166/0x184
[247494.396886]  [<ffffffff81013a01>] ? read_tsc+0x5/0x16
[247494.402092]  [<ffffffff8136d4ec>] ? call_softirq+0x1c/0x30
[247494.407766]  [<ffffffff8100fa3f>] ? do_softirq+0x3f/0x79
[247494.413229]  [<ffffffff8104ef86>] ? irq_exit+0x44/0xb5
[247494.418522]  [<ffffffff8100f38a>] ? do_IRQ+0x94/0xaa
[247494.423686]  [<ffffffff81365f6e>] ? common_interrupt+0x6e/0x6e
[247494.429654]  <EOI>
[247494.431932]  [<ffffffff81200b2e>] ? intel_idle+0xdd/0x117
[247494.437476]  [<ffffffff81200b11>] ? intel_idle+0xc0/0x117
[247494.443031]  [<ffffffff812866ce>] ? cpuidle_idle_call+0xf9/0x1af
[247494.449205]  [<ffffffff8100dde2>] ? cpu_idle+0xaf/0xef
[247494.454509]  [<ffffffff81365c20>] ? _raw_spin_unlock_irqrestore+0xb/0x11
[247494.461377]  [<ffffffff8135e40e>] ? start_secondary+0x1db/0x1e1
[247494.467464] Code: 8b 47 1c f0 ff 4f 1c 0f 94 c0 84 c0 74 17 48 8b 07 f6 c4 40 74 06 5b e9 bb fe ff ff 48 89 df 5b e9 c5 fe ff ff 5b c3 48 83 ec 08 <66> f7 07 00 c0 74 06 59 e9 c6 fe ff ff 8b 47 1c f0 ff 4f 1c 0f
[247494.488128] RIP  [<ffffffff810c6523>] put_page+0x4/0x27
[247494.493523]  RSP <ffff88023fd03b40>
[247494.497117] CR2: 0000000000609920

Is that double-free issue somewhere in e1000e driver? What can be done to further pinpoint this bug?

This is unlikely to be RAM corruption or similar hardware issue because four different servers do experience this bug. All servers share Supermicro MB with e1000e [8086:10d3] NICs.

The time it takes to get to an oops seems to be network traffic related, more traffic means a crash occurs sooner.  A rough ballpark figure we see maybe one crash every 12 hours at 1MB/s and maybe a crash every four or so hours at 20MB/s.

When the traffic is very low (almost no packets) systems are "stable".

I tried to disable all possible NIC offloads (GRO too) but systems still crash. Same with loading e1000e module with InterruptThrottleRate=0.

Stack trace from http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=2eebc1e188e9e45886ee00662519849339884d6d looks very similar but it's in SCTP code.

Network configuration:
# The loopback network interface
auto lo
iface lo inet loopback
        up ip route add unreachable 91.217.135.0/24
        up ip addr  add 91.217.135.1/32 dev lo
        up ip addr  add 91.217.135.128/32 dev lo
        up sysctl -q -w net.ipv4.ip_forward=1
        up sysctl -q -w net.ipv4.conf.all.rp_filter=0

# The primary network interface
allow-hotplug eth1
iface eth1 inet static
        address 23.19.44.162
        netmask 255.255.255.248
        gateway 23.19.44.161

        up ip route change default via 23.19.44.161 dev eth1 initcwnd 10

        # we have to make TCP really dumb for our anycast subnet
        up ip route add default via 23.19.44.161 dev eth1 mtu lock 576 advmss $((576-40)) initcwnd 10 table anycast
        up ip rule add pref 16384 from 91.217.135.0/24 to 91.217.135.0/24 lookup main
        up ip rule add pref 16385 from 91.217.135.0/24 table anycast

# no need to worry about RPF for 91.217.135.0/24
auto tlm100-2
iface tlm100-2 inet static
        address 91.217.135.1
        netmask 255.255.255.255
        pointopoint 91.217.135.2
        mtu 576

        pre-up    ip tunnel add tlm100-2 mode ipip local 23.19.44.162 remote 23.19.43.42
        post-down ip tunnel del tlm100-2

** ip rule ls
0:      from all lookup local
16384:  from 91.217.135.0/24 to 91.217.135.0/24 lookup main
16385:  from 91.217.135.0/24 lookup anycast
32766:  from all lookup main
32767:  from all lookup default

** ip route ls table all
default via 23.19.44.161 dev eth1  table anycast  mtu lock 576 advmss 536 initcwnd 10
default via 23.19.44.161 dev eth1  initcwnd 10
23.19.44.160/29 dev eth1  proto kernel  scope link  src 23.19.44.162
unreachable 91.217.135.0/24
91.217.135.2 dev tlm100-2  proto kernel  scope link  src 91.217.135.1
broadcast 23.19.44.160 dev eth1  table local  proto kernel  scope link  src 23.19.44.162
local 23.19.44.162 dev eth1  table local  proto kernel  scope host  src 23.19.44.162
broadcast 23.19.44.167 dev eth1  table local  proto kernel  scope link  src 23.19.44.162
local 91.217.135.1 dev lo  table local  proto kernel  scope host  src 91.217.135.1
local 91.217.135.1 dev tlm100-2  table local  proto kernel  scope host  src 91.217.135.1
local 91.217.135.128 dev lo  table local  proto kernel  scope host  src 91.217.135.128
broadcast 127.0.0.0 dev lo  table local  proto kernel  scope link  src 127.0.0.1
local 127.0.0.0/8 dev lo  table local  proto kernel  scope host  src 127.0.0.1
local 127.0.0.1 dev lo  table local  proto kernel  scope host  src 127.0.0.1
broadcast 127.255.255.255 dev lo  table local  proto kernel  scope link  src 127.0.0.1

** Network status:
*** IP interfaces and addresses:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
   link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
   inet 127.0.0.1/8 scope host lo
   inet 91.217.135.1/32 scope global lo
   inet 91.217.135.128/32 scope global lo
   inet6 ::1/128 scope host
      valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
   link/ether 00:25:90:35:4d:aa brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
   link/ether 00:25:90:35:4d:ab brd ff:ff:ff:ff:ff:ff
   inet 23.19.44.162/29 brd 23.19.44.167 scope global eth1
   inet6 fe80::225:90ff:fe35:4dab/64 scope link
      valid_lft forever preferred_lft forever
4: tunl0: <NOARP> mtu 1480 qdisc noop state DOWN
   link/ipip 0.0.0.0 brd 0.0.0.0
5: tlm100-2: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 576 qdisc noqueue state UNKNOWN
   link/ipip 23.19.44.162 peer 23.19.43.42
   inet 91.217.135.1 peer 91.217.135.2/32 scope global tlm100-2

Iptables:

Chain INPUT (policy ACCEPT)
target     prot opt source               destination
dumbtcp    tcp  --  0.0.0.0/0            91.217.135.0/24

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
dumbtcp    tcp  --  91.217.135.0/24      0.0.0.0/0

Chain dumbtcp (2 references)
target     prot opt source               destination
TCPOPTSTRIP  tcp  --  0.0.0.0/0            0.0.0.0/0            tcpflags: 0x02/0x02 TCPOPTSTRIP options 3,4,5,8,19
ECN        tcp  --  0.0.0.0/0            0.0.0.0/0            ECN TCP remove

Kind Regards,
Rafal Kupka

Rafal Kupka @ Telemetry
Infrastructure Engineer
Ext 118

London +44 207 148 7777
New York +1 212 380 6666

The digital media forensics company

^ permalink raw reply

* Re: [PATCH 4/7] netprio_cgroup: reimplement priomap expansion
From: Daniel Wagner @ 2012-11-20 15:16 UTC (permalink / raw)
  To: Tejun Heo
  Cc: serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	tgraf-G/eBtMaohhA, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	lizefan-hv44wF8Li93QT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20121120151322.GR15971-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>

On 20.11.2012 16:13, Tejun Heo wrote:
> Hello,
>
> On Tue, Nov 20, 2012 at 04:09:22PM +0100, Daniel Wagner wrote:
>> Thanks for the explanation. I was pondering if the new size in power
>> of two could be a bit too excessive and the allocation step could be
>> linear, e.g. stick at 4096. target_id will increase linear,
>> therefore linear increase might also be enough, no?
>
> Well, power-of-two resizing tends to behave relatively well under most
> cases and slab allocations are binned by power-of-two sizes, so
> linearly growing it doesn't really buy anything.

Convinced :)

Tested and Acked-by: Daniel Wagner <daniel.wagner-98C5kh4wR6ohFhg+JK9F0w@public.gmane.org>

thanks,
daniel

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox