Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: 答复: Is 802.3ad mode in bonding useful ?
From: Neil Horman @ 2011-04-28 13:23 UTC (permalink / raw)
  To: Xiong Huang; +Cc: WeipingPan, netdev@vger.kernel.org
In-Reply-To: <0FBC7C9C2640634A8B837C2EFFCFF32F0184732D72@SHEXMB-01.global.atheros.com>

On Thu, Apr 28, 2011 at 08:31:33PM +0800, Xiong Huang wrote:
> Is there any plan to support cisco FEC/GEC protocol in bonding ?
> 
Not explicitly, no, since roundrobin/xor modes should be compatible already,
and etherchannel can interoperate with 802.3ad bonds if the switches
etherchannel is configured to be in lacp mode:

Neil


> -Xiong
> ________________________________________
> 发件人: netdev-owner@vger.kernel.org [netdev-owner@vger.kernel.org] 代表 Neil Horman [nhorman@tuxdriver.com]
> 发送时间: 2011年4月28日 5:21
> 收件人: WeipingPan
> 抄送: netdev@vger.kernel.org
> 主题: Re: Is 802.3ad mode in bonding useful ?
> 
> On Thu, Apr 28, 2011 at 03:33:50PM +0800, WeipingPan wrote:
> > Hi, all,
> >
> > 802.3ad mode in bonding implements 802.3ad standard.
> >
> > I am just wondering  802.3ad mode is useful,
> > since  bonding has many modes like balance-rr, active-backup, etc.
> >
> Yes, of course its usefull.  For switches which support 802.3ad, this mode
> allows for both peers to understand that the links in the bond are acting as an
> aggregate, which makes it easier to prevent things like inadvertently looped
> back frames, for which the other modes have to have all sorts of hacks to
> prevent.
> Neil
> 
> > thanks
> > Weiping Pan
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 2/3] bql: Byte queue limits
From: Tom Herbert @ 2011-04-28 12:53 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: davem, netdev
In-Reply-To: <20110426091620.7c576a98@nehalam>

> Although this is implemented as a device attribute, from a layering
> point of view it feels like a queuing strategy. Having two competing
> ways to do something is not always a good idea. Why is this not a
> qdisc parameter or a new qdisc?
>
It's an interesting idea, but I'm not sure that this would fit into
the qdisc model without some major re-architecture.  The qdiscs manage
host side above the device, BQL is another layer that would need to be
added beyond the that which is logically below the leaf qdiscs.

Tom

> --
>

^ permalink raw reply

* Re: [PATCHv4 2/7] ethtool: Call ethtool's get/set_settings callbacks with cleaned data
From: Ben Hutchings @ 2011-04-28 12:46 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: David Decotigny, David S. Miller, mirq-linux, Alexander Duyck,
	Eilon Greenstein, Grant Grundler, e1000-devel, linux-kernel,
	netdev
In-Reply-To: <20110428073417.GA2220@redhat.com>

On Thu, 2011-04-28 at 09:34 +0200, Stanislaw Gruszka wrote:
> On Wed, Apr 27, 2011 at 09:32:38PM -0700, David Decotigny wrote:
> > --- a/drivers/net/stmmac/stmmac_ethtool.c
> > +++ b/drivers/net/stmmac/stmmac_ethtool.c
> > @@ -237,13 +237,12 @@ stmmac_set_pauseparam(struct net_device *netdev,
> >  
> >  	if (phy->autoneg) {
> >  		if (netif_running(netdev)) {
> > -			struct ethtool_cmd cmd;
> > +			struct ethtool_cmd cmd = { .cmd = ETHTOOL_SSET };
> >  			/* auto-negotiation automatically restarted */
> > -			cmd.cmd = ETHTOOL_NWAY_RST;
> 
> Why did you change ETHTOOL_NWAY_RST to ETHTOOL_SSET ?

Because the function it's calling is an implementation of ETHTOOL_SSET,
not ETHTOOL_NWAY_RST.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH 2.6.38.4] mii: add support of pause frames in mii_get_an
From: Ben Hutchings @ 2011-04-28 12:45 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: artpol, linux-kernel, davem, netdev
In-Reply-To: <20110427214902.0826a4d1@nehalam>

On Wed, 2011-04-27 at 21:49 -0700, Stephen Hemminger wrote:
> On Thu, 28 Apr 2011 10:49:14 +0700
> artpol <artpol84@gmail.com> wrote:
> 
> > Add support of pause frames advertise in mii_get_an. This provides all drivers 
> > that use mii_ethtool_gset to represent their own and Link partner flow control
> > abilities in ethtool.
> > 
> > Signed-off-by: Artem Polyakov <artpol84@gmail.com>
> > 
> > ---
> > 
> > --- linux-2.6.38.4/drivers/net/mii.c.orig	2011-04-28 08:46:13.000000000 +0700
> > +++ linux-2.6.38.4/drivers/net/mii.c	2011-04-25 23:04:20.694981968 +0700
> > @@ -49,6 +49,10 @@ static u32 mii_get_an(struct mii_if_info
> >  		result |= ADVERTISED_100baseT_Half;
> >  	if (advert & ADVERTISE_100FULL)
> >  		result |= ADVERTISED_100baseT_Full;
> > +	if (advert & ADVERTISE_PAUSE_CAP)
> > +		result |= ADVERTISED_Pause;
> > +	if (advert & ADVERTISE_PAUSE_ASYM)
> > +		result |= ADVERTISED_Asym_Pause;
> >  
> >  	return result;
> >  }
> 
> One common driver problem is that auto negotiation of pause
> is really a separate operation from negotiation of speed.
> It should be possible to force no flow-control but still negotiate
> speed and vice-versa.

If you want to force flow control then turn off pause autoneg.
If you want to force speed then turn off advertising of other speeds.

> The ethtool api breaks this into two operations
> but many drivers munge them together.

There is some interaction between set_pauseparam and set_settings if
pause autoneg is enabled, which is why we have the
mii_advertise_flowctrl() function.

Anyway, this patch covers reporting only.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: Is 802.3ad mode in bonding useful ?
From: Neil Horman @ 2011-04-28 12:21 UTC (permalink / raw)
  To: WeipingPan; +Cc: netdev
In-Reply-To: <4DB9185E.4050103@gmail.com>

On Thu, Apr 28, 2011 at 03:33:50PM +0800, WeipingPan wrote:
> Hi, all,
> 
> 802.3ad mode in bonding implements 802.3ad standard.
> 
> I am just wondering  802.3ad mode is useful,
> since  bonding has many modes like balance-rr, active-backup, etc.
> 
Yes, of course its usefull.  For switches which support 802.3ad, this mode
allows for both peers to understand that the links in the bond are acting as an
aggregate, which makes it easier to prevent things like inadvertently looped
back frames, for which the other modes have to have all sorts of hacks to
prevent.
Neil

> thanks
> Weiping Pan
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: [PATCH 08/13] netvm: Allow skb allocation to use PFMEMALLOC reserves
From: Mel Gorman @ 2011-04-28 11:18 UTC (permalink / raw)
  To: NeilBrown; +Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Peter Zijlstra
In-Reply-To: <20110428204755.2e07147e@notabene.brown>

On Thu, Apr 28, 2011 at 08:47:55PM +1000, NeilBrown wrote:
> On Thu, 28 Apr 2011 11:05:06 +0100 Mel Gorman <mgorman@suse.de> wrote:
> 
> > On Thu, Apr 28, 2011 at 04:19:33PM +1000, NeilBrown wrote:
> > > On Wed, 27 Apr 2011 17:08:06 +0100 Mel Gorman <mgorman@suse.de> wrote:
> > > 
> > > 
> > > > @@ -1578,7 +1589,7 @@ static inline struct sk_buff *netdev_alloc_skb_ip_align(struct net_device *dev,
> > > >   */
> > > >  static inline struct page *__netdev_alloc_page(struct net_device *dev, gfp_t gfp_mask)
> > > >  {
> > > > -	return alloc_pages_node(NUMA_NO_NODE, gfp_mask, 0);
> > > > +	return alloc_pages_node(NUMA_NO_NODE, gfp_mask | __GFP_MEMALLOC, 0);
> > > >  }
> > > >  
> > > 
> > > I'm puzzling a bit over this change.
> > > __netdev_alloc_page appears to be used to get pages to put in ring buffer
> > > for a network card to DMA received packets into.  So it is OK to use
> > > __GFP_MEMALLOC for these allocations providing we mark the resulting skb as
> > > 'pfmemalloc' if a reserved page was used.
> > > 
> > > However I don't see where that marking is done.
> > > I think it should be in skb_fill_page_desc, something like:
> > > 
> > >   if (page->pfmemalloc)
> > > 	skb->pfmemalloc = true;
> > > 
> > > Is this covered somewhere else that I am missing?
> > > 
> > 
> > You're not missing anything.
> > 
> > >From the context of __netdev_alloc_page, we do not know if the skb
> > is suitable for marking pfmemalloc or not (we don't have SKB_ALLOC_RX
> > flag for example that __alloc_skb has). The reserves are potentially
> > being dipped into for an unsuitable packet but it gets dropped in
> > __netif_receive_skb() and the memory is returned. If we mark the skb
> > pfmemalloc as a result of __netdev_alloc_page using a reserve page, the
> > packets would not get dropped as expected.
> > 
> 
> The only code in __netif_receive_skb that seems to drop packets is
> 
> +	if (skb_pfmemalloc(skb) && !skb_pfmemalloc_protocol(skb))
> +		goto drop;
> +
> 
> which requires that the skb have pfmemalloc set before it will be dropped.
> 

Yes, I only wanted to drop the packet if we were under pressure
when skb was allocated. If we hit pressure between when skb was
allocated and when __netdev_alloc_page is called, then the PFMEMALLOC
reserves may be used for packet receive unnecessarily but the next skb
allocation that grows slab will have the flag set appropriately. There
is a window during which we use reserves where we did not have to
but it's limited. Again, the throttling if pfmemalloc reserves gets too
depleted comes into play.

> Actually ... I'm expecting to find code that says:
>    if (skb_pfmalloc(skb) && !sock_flag(sk, SOCK_MEMALLOC))
> 	drop_packet();
> 
> but I cannot find it.  Where is the code that discard pfmalloc packets for
> non-memalloc sockets?
> 
> I can see similar code in sk_filter but that doesn't drop the packet, it just
> avoids filtering it.
> 

hmm, if sk_filter is returning -ENOMEM then things like
sock_queue_rcv_skb() return error and the skb does not get queued and I
expected it to get dropped. What did I miss?

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: Maximum no of bytes Ethernet can transfer at a time ??
From: Neil Horman @ 2011-04-28 11:06 UTC (permalink / raw)
  To: Ajit; +Cc: netdev
In-Reply-To: <loom.20110428T074300-379@post.gmane.org>

On Thu, Apr 28, 2011 at 05:49:23AM +0000, Ajit wrote:
> Neil Horman <nhorman <at> tuxdriver.com> writes:
> 
> 
> hii Neil,
> 
> I am not using the transport layer neither the Network layer. I have written a
> small protocol which work of a LAN without a router. So ideally there is no need
> of the network layer (I am using physical address for the transfer). 
> 
You may not be using the trasport layer, but you are still using IP, if you're
using SOCK_RAW sockets as you indicated.
> I have defined a filed called as "offset" and another one called as "id".
> Whenever you send a file, it is divided into various frames depending upon the
> size of the file and send with the same "id" and an offset incremented by 1.
> 
> As you said the firewall issues. I don't think that is the problem because I am
> sending the file to myself, but , on eth0 rather than lo, because send on lo
> wont give the expected results.
> 
Why not?

> If you still want some explanation then please ask me.
> 
Can you just post a link to the code?
Neil

> Thank you. 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: [PATCH 08/13] netvm: Allow skb allocation to use PFMEMALLOC reserves
From: NeilBrown @ 2011-04-28 10:47 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Peter Zijlstra
In-Reply-To: <20110428100035.GO4658@suse.de>

On Thu, 28 Apr 2011 11:05:06 +0100 Mel Gorman <mgorman@suse.de> wrote:

> On Thu, Apr 28, 2011 at 04:19:33PM +1000, NeilBrown wrote:
> > On Wed, 27 Apr 2011 17:08:06 +0100 Mel Gorman <mgorman@suse.de> wrote:
> > 
> > 
> > > @@ -1578,7 +1589,7 @@ static inline struct sk_buff *netdev_alloc_skb_ip_align(struct net_device *dev,
> > >   */
> > >  static inline struct page *__netdev_alloc_page(struct net_device *dev, gfp_t gfp_mask)
> > >  {
> > > -	return alloc_pages_node(NUMA_NO_NODE, gfp_mask, 0);
> > > +	return alloc_pages_node(NUMA_NO_NODE, gfp_mask | __GFP_MEMALLOC, 0);
> > >  }
> > >  
> > 
> > I'm puzzling a bit over this change.
> > __netdev_alloc_page appears to be used to get pages to put in ring buffer
> > for a network card to DMA received packets into.  So it is OK to use
> > __GFP_MEMALLOC for these allocations providing we mark the resulting skb as
> > 'pfmemalloc' if a reserved page was used.
> > 
> > However I don't see where that marking is done.
> > I think it should be in skb_fill_page_desc, something like:
> > 
> >   if (page->pfmemalloc)
> > 	skb->pfmemalloc = true;
> > 
> > Is this covered somewhere else that I am missing?
> > 
> 
> You're not missing anything.
> 
> >From the context of __netdev_alloc_page, we do not know if the skb
> is suitable for marking pfmemalloc or not (we don't have SKB_ALLOC_RX
> flag for example that __alloc_skb has). The reserves are potentially
> being dipped into for an unsuitable packet but it gets dropped in
> __netif_receive_skb() and the memory is returned. If we mark the skb
> pfmemalloc as a result of __netdev_alloc_page using a reserve page, the
> packets would not get dropped as expected.
> 

The only code in __netif_receive_skb that seems to drop packets is

+	if (skb_pfmemalloc(skb) && !skb_pfmemalloc_protocol(skb))
+		goto drop;
+

which requires that the skb have pfmemalloc set before it will be dropped.

Actually ... I'm expecting to find code that says:
   if (skb_pfmalloc(skb) && !sock_flag(sk, SOCK_MEMALLOC))
	drop_packet();

but I cannot find it.  Where is the code that discard pfmalloc packets for
non-memalloc sockets?

I can see similar code in sk_filter but that doesn't drop the packet, it just
avoids filtering it.

NeilBrown

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 12/13] mm: Throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage
From: Mel Gorman @ 2011-04-28 10:14 UTC (permalink / raw)
  To: NeilBrown; +Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Peter Zijlstra
In-Reply-To: <20110428102244.6e1113e9@notabene.brown>

On Thu, Apr 28, 2011 at 10:22:44AM +1000, NeilBrown wrote:
> On Wed, 27 Apr 2011 17:08:10 +0100 Mel Gorman <mgorman@suse.de> wrote:
> 
> 
> > +/*
> > + * Throttle direct reclaimers if backing storage is backed by the network
> > + * and the PFMEMALLOC reserve for the preferred node is getting dangerously
> > + * depleted. kswapd will continue to make progress and wake the processes
> > + * when the low watermark is reached
> > + */
> > +static void throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
> > +					nodemask_t *nodemask)
> > +{
> > +	struct zone *zone;
> > +	int high_zoneidx = gfp_zone(gfp_mask);
> > +	DEFINE_WAIT(wait);
> > +
> > +	/* Check if the pfmemalloc reserves are ok */
> > +	first_zones_zonelist(zonelist, high_zoneidx, NULL, &zone);
> > +	if (pfmemalloc_watermark_ok(zone->zone_pgdat, high_zoneidx))
> > +		return;
> 
> As the first thing that 'wait_event_interruptible" does is test the condition
> and return if it is true, this "if () return;" is pointless.
>  

In patch 13, we count the number of times we got throttled. In this
patch, the check is pointless but it makes sense in the context of
the following patch.

> > +
> > +	/* Throttle */
> > +	wait_event_interruptible(zone->zone_pgdat->pfmemalloc_wait,
> > +		pfmemalloc_watermark_ok(zone->zone_pgdat, high_zoneidx));
> > +}
> 
> I was surprised that you chose wait_event_interruptible as your previous code
> was almost exactly "wait_event_killable".
> 
> Is there some justification for not throttling processes which happen to have
> a (non-fatal) signal pending?
> 

No justification, wait_event_killable() is indeed a better fit.

> > +
> >  unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
> >  				gfp_t gfp_mask, nodemask_t *nodemask)
> >  {
> > @@ -2133,6 +2172,15 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
> >  		.nodemask = nodemask,
> >  	};
> >  
> > +	throttle_direct_reclaim(gfp_mask, zonelist, nodemask);
> > +
> > +	/*
> > +	 * Do not enter reclaim if fatal signal is pending. 1 is returned so
> > +	 * that the page allocator does not consider triggering OOM
> > +	 */
> > +	if (fatal_signal_pending(current))
> > +		return 1;
> > +
> >  	trace_mm_vmscan_direct_reclaim_begin(order,
> >  				sc.may_writepage,
> >  				gfp_mask);
> > @@ -2488,6 +2536,12 @@ loop_again:
> >  			}
> >  
> >  		}
> > +
> > +		/* Wake throttled direct reclaimers if low watermark is met */
> > +		if (waitqueue_active(&pgdat->pfmemalloc_wait) &&
> > +				pfmemalloc_watermark_ok(pgdat, MAX_NR_ZONES - 1))
> > +			wake_up_interruptible(&pgdat->pfmemalloc_wait);
> > +
> >  		if (all_zones_ok || (order && pgdat_balanced(pgdat, balanced, *classzone_idx)))
> >  			break;		/* kswapd: all done */
> >  		/*
> 

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 08/13] netvm: Allow skb allocation to use PFMEMALLOC reserves
From: Mel Gorman @ 2011-04-28 10:05 UTC (permalink / raw)
  To: NeilBrown; +Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Peter Zijlstra
In-Reply-To: <20110428161933.1f1266e6@notabene.brown>

On Thu, Apr 28, 2011 at 04:19:33PM +1000, NeilBrown wrote:
> On Wed, 27 Apr 2011 17:08:06 +0100 Mel Gorman <mgorman@suse.de> wrote:
> 
> 
> > @@ -1578,7 +1589,7 @@ static inline struct sk_buff *netdev_alloc_skb_ip_align(struct net_device *dev,
> >   */
> >  static inline struct page *__netdev_alloc_page(struct net_device *dev, gfp_t gfp_mask)
> >  {
> > -	return alloc_pages_node(NUMA_NO_NODE, gfp_mask, 0);
> > +	return alloc_pages_node(NUMA_NO_NODE, gfp_mask | __GFP_MEMALLOC, 0);
> >  }
> >  
> 
> I'm puzzling a bit over this change.
> __netdev_alloc_page appears to be used to get pages to put in ring buffer
> for a network card to DMA received packets into.  So it is OK to use
> __GFP_MEMALLOC for these allocations providing we mark the resulting skb as
> 'pfmemalloc' if a reserved page was used.
> 
> However I don't see where that marking is done.
> I think it should be in skb_fill_page_desc, something like:
> 
>   if (page->pfmemalloc)
> 	skb->pfmemalloc = true;
> 
> Is this covered somewhere else that I am missing?
> 

You're not missing anything.

>From the context of __netdev_alloc_page, we do not know if the skb
is suitable for marking pfmemalloc or not (we don't have SKB_ALLOC_RX
flag for example that __alloc_skb has). The reserves are potentially
being dipped into for an unsuitable packet but it gets dropped in
__netif_receive_skb() and the memory is returned. If we mark the skb
pfmemalloc as a result of __netdev_alloc_page using a reserve page, the
packets would not get dropped as expected.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 02/13] mm: sl[au]b: Add knowledge of PFMEMALLOC reserve pages
From: Mel Gorman @ 2011-04-28  9:46 UTC (permalink / raw)
  To: NeilBrown; +Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Peter Zijlstra
In-Reply-To: <20110428092110.608eb354@notabene.brown>

On Thu, Apr 28, 2011 at 09:21:10AM +1000, NeilBrown wrote:
> On Tue, 26 Apr 2011 14:59:40 +0100 Mel Gorman <mgorman@suse.de> wrote:
> 
> > On Tue, Apr 26, 2011 at 09:37:58PM +1000, NeilBrown wrote:
> > > On Tue, 26 Apr 2011 08:36:43 +0100 Mel Gorman <mgorman@suse.de> wrote:
> > > 
> > > > +		/*
> > > > +		 * If there are full empty slabs and we were not forced to
> > > > +		 * allocate a slab, mark this one !pfmemalloc
> > > > +		 */
> > > > +		l3 = cachep->nodelists[numa_mem_id()];
> > > > +		if (!list_empty(&l3->slabs_free) && force_refill) {
> > > > +			struct slab *slabp = virt_to_slab(objp);
> > > > +			slabp->pfmemalloc = false;
> > > > +			clear_obj_pfmemalloc(&objp);
> > > > +			check_ac_pfmemalloc(cachep, ac);
> > > > +			return objp;
> > > > +		}
> > > 
> > > The comment doesn't match the code.  I think you need to remove the words
> > > "full" and "not" assuming the code is correct which it probably is...
> > > 
> > 
> > I'll fix up the comment, you're right, it's confusing.
> > 
> > > But the code seems to be much more complex than Peter's original, and I don't
> > > see the gain.
> > > 
> > 
> > You're right, it is more complex.
> > 
> > > Peter's code had only one 'reserved' flag for each kmem_cache. 
> > 
> > The reserve was set in a per-cpu structure so there was a "lag" time
> > before that information was available to other CPUs. Fine on smaller
> > machines but a bit more of a problem today. 
> > 
> > > You seem to
> > > have one for every slab.  I don't see the point.
> > > It is true that yours is in some sense more fair - but I'm not sure the
> > > complexity is worth it.
> > > 
> > 
> > More fairness was one of the objects.
> > 
> > > Was there some particular reason you made the change?
> > > 
> > 
> > This version survives under considerably more stress than Peter's
> > original version did without requiring the additional complexity of
> > memory reserves.
> > 
> 
> That is certainly a very compelling argument .... but I still don't get why.
> I'm sorry if I'm being dense, but I still don't see why the complexity buys
> us better stability and I really would like to understand.
> 
> You don't seem to need the same complexity for SLUB with the justification
> of "SLUB generally maintaining smaller lists than SLAB".
> 

It is an educated guess that the length of the lists was what was
relevant. Even without these patches, SLUB is harder to lockup (minutes
rather than seconds to halt the machine) than SLAB and I assumed it
was because there were fewer pages pinned on per-CPU lists with SLUB.

> Presumably these are per-CPU lists of free objects or slabs? 

The per-cpu lists are of objects (the entry[] array in struct
array_cache). The slab management structure is looked up much less
frequently (when a block of objects are being freed for example).

> If the things
> on those lists could be used by anyone long lists couldn't hurt.

Long lists can hurt in a few ways but I believe the two relevant reasons
for this series are;

1. A remote CPU could be holding the object on its free list
2. Multiple unnecessary caches could be pinning free memory with the
   shrinkers not triggering because everything is waiting on IO to complete

> So the problem must be that the lists get long while the array_cache is still
> marked as 'pfmemalloc'

By marking the array_cache pfmemalloc we can have objects on the list
that are a mix of pfmemalloc and !pfmemalloc objects with very coarse
control over who is accessing them.

For example. In Peters patches, CPU A could allocate from pfmemalloc
reserves and mark its array_cache appropriately. CPU B could be freeing
the objects but not have its array_cache marked. PFMEMALLOC objects
are now available for !PFMEMALLOC uses on that CPU and we dip further
into our reserves. This was managed by the memory reservation patches
which meant that dipping further into the reserves was not that much
of a problem.

> (or 'reserve' in Peter's patches).
> 
> Is that the problem?  That reserve memory gets locked up in SLAB freelists?
> If so - would that be more easily addressed by effectively reducing the
> 'batching' when the array_cache had dipped into reserves, so slabs are
> returned to the VM more promptly?
> 

Pinning objects on long list is one problem but I don't think it's
the most important problem. Indications were that the big problem
was insufficient control over who was accessing objects belonging to
slab pages allocated from the pfmemalloc reserves.

> Probably related, now that you've fixed the comment here (thanks):
> 
> +		/*
> +		 * If there are empty slabs on the slabs_free list and we are
> +		 * being forced to refill the cache, mark this one !pfmemalloc.
> +		 */
> +		l3 = cachep->nodelists[numa_mem_id()];
> +		if (!list_empty(&l3->slabs_free) && force_refill) {
> +			struct slab *slabp = virt_to_slab(objp);
> +			slabp->pfmemalloc = false;
> +			clear_obj_pfmemalloc(&objp);
> +			check_ac_pfmemalloc(cachep, ac);
> +			return objp;
> +		}
> 
> I'm trying to understand it...

Thanks :)

> The context is that a non-MEMALLOC allocation is happening and everything on
> the free lists is reserved for MEMALLOC allocations.

Yep.

> So if in that case there is a completely free slab on the free list we decide
> that it is OK to mark the current slab as non-MEMALLOC.

Yep.

> The logic seems to be that we could just release that free slab to the VM,
> then an alloc_page would be able to get it back. 

We could, it'd be slower and there would need to be some sort of
retry path in a hotter code path to catch the situation but we could.

> But if we are still well
> below the reserve watermark, then there might be some other allocation that
> is more deserving of the page and we shouldn't just assume we can take
> it with actually calling in to alloc_pages to check that we are no longer
> running on reserves..
> 
> So this looks like an optimisation that is wrong.

The problem that is being addressed here is that pfmemalloc slabs
have to made available for general use at some point or slabs can
be artifically large and waste memory. I didn't do a free and retry
path because it'd be more expensive and I wanted to avoid hurting
the common slab paths.

The assumption is that if there are free slab pages while there are
pfmemalloc slabs in use then we cannot be under that much pressure
and the free slab page is safe to use. If we go well below the reserve
watermark as you are concerned about, the throttle logic will trigger
and we'll at least identify that the situation occured without the
system crashing.

> BTW, 
> 
> 
> +	/* Record if ALLOC_PFMEMALLOC was set when allocating the slab */
> +	if (pfmemalloc) {
> +		struct array_cache *ac = cpu_cache_get(cachep);
> +		slabp->pfmemalloc = true;
> +		ac->pfmemalloc = 1;
> +	}
> +
> 
> I think that "= 1"  should be "= true".  :-)
> 

/me slaps self

Thanks

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] net: ftmac100: fix scheduling while atomic during PHY link status change
From: Po-Yu Chuang @ 2011-04-28  9:11 UTC (permalink / raw)
  To: Adam Jaremko
  Cc: Ratbert Po-Yu Chuang(莊博宇), netdev,
	David Miller, Eric Dumazet
In-Reply-To: <BANLkTi=gcFB4mbmWKu4neep77Jp8R4HWGQ@mail.gmail.com>

Hi Adam,

2011/4/27 Adam Jaremko <adam.jaremko@gmail.com>:
> Signed-off-by: Adam Jaremko <adam.jaremko@gmail.com>
> ---
>  drivers/net/ftmac100.c |    8 ++++----
>  1 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ftmac100.c b/drivers/net/ftmac100.c
> index a316619..9bd7746 100644
> --- a/drivers/net/ftmac100.c
> +++ b/drivers/net/ftmac100.c
> @@ -139,11 +139,11 @@ static int ftmac100_reset(struct ftmac100 *priv)
>                         * that hardware reset completed (what the f*ck).
>                         * We still need to wait for a while.
>                         */
> -                       usleep_range(500, 1000);
> +                       udelay(500);
>                        return 0;
>                }
>
> -               usleep_range(1000, 10000);
> +               udelay(1000);
>        }
>
>        netdev_err(netdev, "software reset failed\n");
> @@ -772,7 +772,7 @@ static int ftmac100_mdio_read(struct net_device
> *netdev, int phy_id, int reg)
>                if ((phycr & FTMAC100_PHYCR_MIIRD) == 0)
>                        return phycr & FTMAC100_PHYCR_MIIRDATA;
>
> -               usleep_range(100, 1000);
> +               udelay(100);
>        }
>
>        netdev_err(netdev, "mdio read timed out\n");
> @@ -801,7 +801,7 @@ static void ftmac100_mdio_write(struct net_device
> *netdev, int phy_id, int reg,
>                if ((phycr & FTMAC100_PHYCR_MIIWR) == 0)
>                        return;
>
> -               usleep_range(100, 1000);
> +               udelay(100);
>        }
>
>        netdev_err(netdev, "mdio write timed out\n");
> --
> 1.7.4.4
>

Not sure if your patch is indeed corrupt or it's my fault
while downloading patch from gmail:

ERROR: patch seems to be corrupt (line wrapped?)
#62: FILE: drivers/net/ftmac100.c:771:
*netdev, int phy_id, int reg)

total: 1 errors, 0 warnings, 31 lines checked

ftmac100-fix.patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.


Except that,

Acked-by: Po-Yu Chuang <ratbert@faraday-tech.com>

best regards,
Po-Yu Chuang

^ permalink raw reply

* Re: [PATCH 6/6 v8] sctp: Add ASCONF operation on the single-homed host
From: Wei Yongjun @ 2011-04-28  8:59 UTC (permalink / raw)
  To: Michio Honda; +Cc: netdev, YOSHIFUJI Hideaki
In-Reply-To: <5183C2C0-C377-4FF7-89A1-3F19D53FBBBF@sfc.wide.ad.jp>


>>>>> +			/* reuse the parameter length from the same scope one */
>>>>> +			totallen += paramlen;
>>>>> +			totallen += addr_param_len;
>>>>> +			del_pickup = 1;
>>>>> +			asoc->src_out_of_asoc_ok = 1;
>>>> src_out_of_asoc_ok should be marked when the last address is
>>>> assigned to asoc->asconf_addr_del_pending?
>>> I thought marking src_out_of_asoc_ok should be set when the candidate new address appears, rather than when the last address is being deleted.  
>>> Because until such address appears, there is no situation to send any chunk from the address not in the association.  
>> The last address have been marked as DEL, will never using
>> for sending chunks. At this time, there is no valid address in the
>> host, chunk can not be send out by host.
> I understand that, marking out_of_asoc_ok at the last address being deleted does same thing.  
> However, out_of_asoc_ok state is not regular, so I think we should shorten that duration as much as possible.  
> But this is my personal opinion, so if you don't think so, I will mark it when the last address being deleted.  

You can do some test about this(I do not test :-) )

remove all the address while data still be sent, wait
some time before the new address is added, and see
whether there is some different between those. Not
sure which or both is right. If it is the same, remain
the code not change.

>>>>> +			SCTP_DEBUG_PRINTK("mkasconf_update_ip: picked same-scope del_pending addr, totallen for all addresses is %d\n", totallen);
>>>>> +		}
>>>>> 	}
>>>>>
>>>>> 	/* Create an asconf chunk with the required length. */
>>>>> @@ -2802,6 +2819,19 @@ struct sctp_chunk *sctp_make_asconf_update_ip(struct sctp_association *asoc,
>>>>>
>>>>> 		addr_buf += af->sockaddr_len;
>>>>> 	}
>>>>> +	if (flags == SCTP_PARAM_ADD_IP && del_pickup) {
>>>>> +		addr = asoc->asconf_addr_del_pending;
>>>>> +		del_af = sctp_get_af_specific(addr->v4.sin_family);
>>>>> +		del_addr_param_len = del_af->to_addr_param(addr,
>>>>> +		    &del_addr_param);
>>>>> +		del_param.param_hdr.type = SCTP_PARAM_DEL_IP;
>>>>> +		del_param.param_hdr.length = htons(del_paramlen +
>>>>> +		    del_addr_param_len);
>>>>> +		del_param.crr_id = i;
>>>>> +
>>>>> +		sctp_addto_chunk(retval, del_paramlen, &del_param);
>>>>> +		sctp_addto_chunk(retval, del_addr_param_len, &del_addr_param);
>>>>> +	}
>>>>> 	return retval;
>>>>> }
>>>> How about clean up this part as:
>>>>
>>>> +       if (...) {
>>>> +               addr = asoc->asconf_addr_del_pending;
>>>> +               af = sctp_get_af_specific(addr->v4.sin_family);
>>>> +               addr_param_len = af->to_addr_param(addr, &addr_param);
>>>> +               totallen += paramlen;
>>>> +               totallen += addr_param_len;
>>>> +		...
>>>> +       }
>>>> +
>>>>      /* Create an asconf chunk with the required length. */
>>>>      retval = sctp_make_asconf(asoc, laddr, totallen);
>>>>      if (!retval)
>>>> @@ -2802,6 +2812,18 @@ struct sctp_chunk *sctp_make_asconf_update_ip(struct sctp_association *asoc,
>>>>
>>>>              addr_buf += af->sockaddr_len;
>>>>      }
>>>> +
>>>> +       if (...) {
>>>> +               addr = asoc->asconf_addr_del_pending;
>>>> +               af = sctp_get_af_specific(addr->v4.sin_family);
>>>> +               addr_param_len = af->to_addr_param(addr, &addr_param);
>>>> +               param.param_hdr.type = SCTP_PARAM_DEL_IP;
>>>> +               param.param_hdr.length = htons(paramlen + addr_param_len);
>>>> +               param.crr_id = i;
>>>> +
>>>> +               sctp_addto_chunk(retval, paramlen, &param);
>>>> +               sctp_addto_chunk(retval, addr_param_len, &addr_param);
>>>> +       }
>>> agreed with reusing af instead of defining del_af.  
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* [PATCH net-next-2.6] net: dont hold rtnl mutex during netlink dump callbacks
From: Eric Dumazet @ 2011-04-28  8:56 UTC (permalink / raw)
  To: David Miller; +Cc: Patrick McHardy, netdev, Remi Denis-Courmont

Four years ago, Patrick made a change to hold rtnl mutex during netlink
dump callbacks.

I believe it was a wrong move. This slows down concurrent dumps, making
good old /proc/net/ files faster than rtnetlink in some situations.

This occurred to me because one "ip link show dev ..." was _very_ slow
on a workload adding/removing network devices in background.

All dump callbacks are able to use RCU locking now, so this patch does
roughly a revert of commits :

1c2d670f366 : [RTNETLINK]: Hold rtnl_mutex during netlink dump callbacks
6313c1e0992 : [RTNETLINK]: Remove unnecessary locking in dump callbacks

This let writers fight for rtnl mutex and readers going full speed.

It also takes care of phonet : phonet_route_get() is now called from rcu
read section. I renamed it to phonet_route_get_rcu()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Remi Denis-Courmont <remi.denis-courmont@nokia.com>
---
 include/net/phonet/pn_dev.h |    2 +-
 net/bridge/br_netlink.c     |    7 ++++---
 net/core/fib_rules.c        |    3 ++-
 net/core/rtnetlink.c        |   12 +++++-------
 net/decnet/dn_dev.c         |   10 ++++++----
 net/ipv6/ip6_fib.c          |    4 +++-
 net/phonet/pn_dev.c         |    6 +-----
 net/phonet/pn_netlink.c     |    4 +++-
 8 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/include/net/phonet/pn_dev.h b/include/net/phonet/pn_dev.h
index 13649eb..8639de5 100644
--- a/include/net/phonet/pn_dev.h
+++ b/include/net/phonet/pn_dev.h
@@ -51,7 +51,7 @@ void phonet_address_notify(int event, struct net_device *dev, u8 addr);
 int phonet_route_add(struct net_device *dev, u8 daddr);
 int phonet_route_del(struct net_device *dev, u8 daddr);
 void rtm_phonet_notify(int event, struct net_device *dev, u8 dst);
-struct net_device *phonet_route_get(struct net *net, u8 daddr);
+struct net_device *phonet_route_get_rcu(struct net *net, u8 daddr);
 struct net_device *phonet_route_output(struct net *net, u8 daddr);
 
 #define PN_NO_ADDR	0xff
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 134a2ff..ffb0dc4 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -120,8 +120,9 @@ static int br_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
 	int idx;
 
 	idx = 0;
-	for_each_netdev(net, dev) {
-		struct net_bridge_port *port = br_port_get_rtnl(dev);
+	rcu_read_lock();
+	for_each_netdev_rcu(net, dev) {
+		struct net_bridge_port *port = br_port_get_rcu(dev);
 
 		/* not a bridge port */
 		if (!port || idx < cb->args[0])
@@ -135,7 +136,7 @@ static int br_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
 skip:
 		++idx;
 	}
-
+	rcu_read_unlock();
 	cb->args[0] = idx;
 
 	return skb->len;
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 8248ebb..3911586 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -590,7 +590,8 @@ static int dump_rules(struct sk_buff *skb, struct netlink_callback *cb,
 	int idx = 0;
 	struct fib_rule *rule;
 
-	list_for_each_entry(rule, &ops->rules_list, list) {
+	rcu_read_lock();
+	list_for_each_entry_rcu(rule, &ops->rules_list, list) {
 		if (idx < cb->args[1])
 			goto skip;
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d7c4bb4..2963312 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1007,10 +1007,11 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
 	s_h = cb->args[0];
 	s_idx = cb->args[1];
 
+	rcu_read_lock();
 	for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) {
 		idx = 0;
 		head = &net->dev_index_head[h];
-		hlist_for_each_entry(dev, node, head, index_hlist) {
+		hlist_for_each_entry_rcu(dev, node, head, index_hlist) {
 			if (idx < s_idx)
 				goto cont;
 			if (rtnl_fill_ifinfo(skb, dev, RTM_NEWLINK,
@@ -1023,6 +1024,7 @@ cont:
 		}
 	}
 out:
+	rcu_read_unlock();
 	cb->args[1] = idx;
 	cb->args[0] = h;
 
@@ -1879,7 +1881,6 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	int min_len;
 	int family;
 	int type;
-	int err;
 
 	type = nlh->nlmsg_type;
 	if (type > RTM_MAX)
@@ -1906,11 +1907,8 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 		if (dumpit == NULL)
 			return -EOPNOTSUPP;
 
-		__rtnl_unlock();
 		rtnl = net->rtnl;
-		err = netlink_dump_start(rtnl, skb, nlh, dumpit, NULL);
-		rtnl_lock();
-		return err;
+		return netlink_dump_start(rtnl, skb, nlh, dumpit, NULL);
 	}
 
 	memset(rta_buf, 0, (rtattr_max * sizeof(struct rtattr *)));
@@ -1980,7 +1978,7 @@ static int __net_init rtnetlink_net_init(struct net *net)
 {
 	struct sock *sk;
 	sk = netlink_kernel_create(net, NETLINK_ROUTE, RTNLGRP_MAX,
-				   rtnetlink_rcv, &rtnl_mutex, THIS_MODULE);
+				   rtnetlink_rcv, NULL, THIS_MODULE);
 	if (!sk)
 		return -ENOMEM;
 	net->rtnl = sk;
diff --git a/net/decnet/dn_dev.c b/net/decnet/dn_dev.c
index 0dcaa90..404fa15 100644
--- a/net/decnet/dn_dev.c
+++ b/net/decnet/dn_dev.c
@@ -752,7 +752,8 @@ static int dn_nl_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
 	skip_naddr = cb->args[1];
 
 	idx = 0;
-	for_each_netdev(&init_net, dev) {
+	rcu_read_lock();
+	for_each_netdev_rcu(&init_net, dev) {
 		if (idx < skip_ndevs)
 			goto cont;
 		else if (idx > skip_ndevs) {
@@ -761,11 +762,11 @@ static int dn_nl_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
 			skip_naddr = 0;
 		}
 
-		if ((dn_db = rtnl_dereference(dev->dn_ptr)) == NULL)
+		if ((dn_db = rcu_dereference(dev->dn_ptr)) == NULL)
 			goto cont;
 
-		for (ifa = rtnl_dereference(dn_db->ifa_list), dn_idx = 0; ifa;
-		     ifa = rtnl_dereference(ifa->ifa_next), dn_idx++) {
+		for (ifa = rcu_dereference(dn_db->ifa_list), dn_idx = 0; ifa;
+		     ifa = rcu_dereference(ifa->ifa_next), dn_idx++) {
 			if (dn_idx < skip_naddr)
 				continue;
 
@@ -778,6 +779,7 @@ cont:
 		idx++;
 	}
 done:
+	rcu_read_unlock();
 	cb->args[0] = idx;
 	cb->args[1] = dn_idx;
 
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index dd88df0..4076a0b 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -394,10 +394,11 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
 	arg.net = net;
 	w->args = &arg;
 
+	rcu_read_lock();
 	for (h = s_h; h < FIB6_TABLE_HASHSZ; h++, s_e = 0) {
 		e = 0;
 		head = &net->ipv6.fib_table_hash[h];
-		hlist_for_each_entry(tb, node, head, tb6_hlist) {
+		hlist_for_each_entry_rcu(tb, node, head, tb6_hlist) {
 			if (e < s_e)
 				goto next;
 			res = fib6_dump_table(tb, skb, cb);
@@ -408,6 +409,7 @@ next:
 		}
 	}
 out:
+	rcu_read_unlock();
 	cb->args[1] = e;
 	cb->args[0] = h;
 
diff --git a/net/phonet/pn_dev.c b/net/phonet/pn_dev.c
index 947038d..47b3452 100644
--- a/net/phonet/pn_dev.c
+++ b/net/phonet/pn_dev.c
@@ -426,18 +426,14 @@ int phonet_route_del(struct net_device *dev, u8 daddr)
 	return 0;
 }
 
-struct net_device *phonet_route_get(struct net *net, u8 daddr)
+struct net_device *phonet_route_get_rcu(struct net *net, u8 daddr)
 {
 	struct phonet_net *pnn = phonet_pernet(net);
 	struct phonet_routes *routes = &pnn->routes;
 	struct net_device *dev;
 
-	ASSERT_RTNL(); /* no need to hold the device */
-
 	daddr >>= 2;
-	rcu_read_lock();
 	dev = rcu_dereference(routes->table[daddr]);
-	rcu_read_unlock();
 	return dev;
 }
 
diff --git a/net/phonet/pn_netlink.c b/net/phonet/pn_netlink.c
index 58b3b1f..438accb 100644
--- a/net/phonet/pn_netlink.c
+++ b/net/phonet/pn_netlink.c
@@ -264,10 +264,11 @@ static int route_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
 	struct net *net = sock_net(skb->sk);
 	u8 addr, addr_idx = 0, addr_start_idx = cb->args[0];
 
+	rcu_read_lock();
 	for (addr = 0; addr < 64; addr++) {
 		struct net_device *dev;
 
-		dev = phonet_route_get(net, addr << 2);
+		dev = phonet_route_get_rcu(net, addr << 2);
 		if (!dev)
 			continue;
 
@@ -279,6 +280,7 @@ static int route_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
 	}
 
 out:
+	rcu_read_unlock();
 	cb->args[0] = addr_idx;
 	cb->args[1] = 0;
 



^ permalink raw reply related

* Re: [PATCH] Applying inappropriate ioctl operation on socket should return ENOTTY
From: Lifeng Sun @ 2011-04-28  8:49 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel, netdev
In-Reply-To: <20110427130904.47331a9f@lxorguk.ukuu.org.uk>

On Wed, Apr 27, 2011 at 8:09 PM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
>> diff --git a/drivers/char/viotape.c b/drivers/char/viotape.c
>> index ad6e64a..a427d40 100644
>> --- a/drivers/char/viotape.c
>> +++ b/drivers/char/viotape.c
>> @@ -529,7 +529,7 @@ static int viotap_ioctl(struct inode *inode, struct file *file,
>>
>>       down(&reqSem);
>>
>> -     ret = -EINVAL;
>> +     ret = -ENOTTY;
>
> Again this messes up the returns because code assumes the initial
> default.

> The original code also has bugs too (wrong error off
> copy_*_user() again)

It seems alright.

- Lifeng

^ permalink raw reply

* Re: [PATCH] usbnet: add support for some Huawei modems with cdc-ether ports
From: Oliver Neukum @ 2011-04-28  8:46 UTC (permalink / raw)
  To: Dan Williams; +Cc: netdev, linux-usb
In-Reply-To: <1303934071.25456.2.camel@dcbw.foobar.com>

Am Mittwoch, 27. April 2011, 21:54:28 schrieb Dan Williams:
> Some newer Huawei devices (T-Mobile Rocket, others) have cdc-ether
> compatible ports, so recognize and expose them.
> 
> Signed-off-by: Dan Williams <dcbw@redhat.com>

Acked-by: Oliver Neukum <oneukum@suse.de>

	Regards
		Oliver

^ permalink raw reply

* [PATCH 1/5] networking: inappropriate ioctl operation should return ENOTTY
From: Lifeng Sun @ 2011-04-28  8:04 UTC (permalink / raw)
  To: Alan Cox; +Cc: David S. Miller, netdev, linux-kernel, Lifeng Sun
In-Reply-To: <20110427130904.47331a9f@lxorguk.ukuu.org.uk>

ioctl() calls against a socket with an inappropriate ioctl operation
are incorrectly returning EINVAL rather than ENOTTY:

  [ENOTTY]
      Inappropriate I/O control operation.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=33992

Signed-off-by: Lifeng Sun <lifongsun@gmail.com>
---
 net/core/dev.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index c2ac599..856b6ee 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4773,7 +4773,7 @@ static int dev_ifsioc_locked(struct net *net, struct ifreq *ifr, unsigned int cm
 		 * is never reached
 		 */
 		WARN_ON(1);
-		err = -EINVAL;
+		err = -ENOTTY;
 		break;
 
 	}
@@ -5041,7 +5041,7 @@ int dev_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 		/* Set the per device memory buffer space.
 		 * Not applicable in our case */
 	case SIOCSIFLINK:
-		return -EINVAL;
+		return -ENOTTY;
 
 	/*
 	 *	Unknown or private ioctl.
@@ -5062,7 +5062,7 @@ int dev_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 		/* Take care of Wireless Extensions */
 		if (cmd >= SIOCIWFIRST && cmd <= SIOCIWLAST)
 			return wext_handle_ioctl(net, &ifr, cmd, arg);
-		return -EINVAL;
+		return -ENOTTY;
 	}
 }
 
-- 
1.7.5.rc1

^ permalink raw reply related

* Re: [PATCH 6/6 v8] sctp: Add ASCONF operation on the single-homed host
From: Michio Honda @ 2011-04-28  8:04 UTC (permalink / raw)
  To: Wei Yongjun; +Cc: netdev, YOSHIFUJI Hideaki
In-Reply-To: <4DB915EF.8000003@cn.fujitsu.com>

>> 
>>>> +			/* reuse the parameter length from the same scope one */
>>>> +			totallen += paramlen;
>>>> +			totallen += addr_param_len;
>>>> +			del_pickup = 1;
>>>> +			asoc->src_out_of_asoc_ok = 1;
>>> src_out_of_asoc_ok should be marked when the last address is
>>> assigned to asoc->asconf_addr_del_pending?
>> I thought marking src_out_of_asoc_ok should be set when the candidate new address appears, rather than when the last address is being deleted.  
>> Because until such address appears, there is no situation to send any chunk from the address not in the association.  
> 
> The last address have been marked as DEL, will never using
> for sending chunks. At this time, there is no valid address in the
> host, chunk can not be send out by host.
I understand that, marking out_of_asoc_ok at the last address being deleted does same thing.  
However, out_of_asoc_ok state is not regular, so I think we should shorten that duration as much as possible.  
But this is my personal opinion, so if you don't think so, I will mark it when the last address being deleted.  
> 
>>>> +			SCTP_DEBUG_PRINTK("mkasconf_update_ip: picked same-scope del_pending addr, totallen for all addresses is %d\n", totallen);
>>>> +		}
>>>> 	}
>>>> 
>>>> 	/* Create an asconf chunk with the required length. */
>>>> @@ -2802,6 +2819,19 @@ struct sctp_chunk *sctp_make_asconf_update_ip(struct sctp_association *asoc,
>>>> 
>>>> 		addr_buf += af->sockaddr_len;
>>>> 	}
>>>> +	if (flags == SCTP_PARAM_ADD_IP && del_pickup) {
>>>> +		addr = asoc->asconf_addr_del_pending;
>>>> +		del_af = sctp_get_af_specific(addr->v4.sin_family);
>>>> +		del_addr_param_len = del_af->to_addr_param(addr,
>>>> +		    &del_addr_param);
>>>> +		del_param.param_hdr.type = SCTP_PARAM_DEL_IP;
>>>> +		del_param.param_hdr.length = htons(del_paramlen +
>>>> +		    del_addr_param_len);
>>>> +		del_param.crr_id = i;
>>>> +
>>>> +		sctp_addto_chunk(retval, del_paramlen, &del_param);
>>>> +		sctp_addto_chunk(retval, del_addr_param_len, &del_addr_param);
>>>> +	}
>>>> 	return retval;
>>>> }
>>> How about clean up this part as:
>>> 
>>> +       if (...) {
>>> +               addr = asoc->asconf_addr_del_pending;
>>> +               af = sctp_get_af_specific(addr->v4.sin_family);
>>> +               addr_param_len = af->to_addr_param(addr, &addr_param);
>>> +               totallen += paramlen;
>>> +               totallen += addr_param_len;
>>> +		...
>>> +       }
>>> +
>>>      /* Create an asconf chunk with the required length. */
>>>      retval = sctp_make_asconf(asoc, laddr, totallen);
>>>      if (!retval)
>>> @@ -2802,6 +2812,18 @@ struct sctp_chunk *sctp_make_asconf_update_ip(struct sctp_association *asoc,
>>> 
>>>              addr_buf += af->sockaddr_len;
>>>      }
>>> +
>>> +       if (...) {
>>> +               addr = asoc->asconf_addr_del_pending;
>>> +               af = sctp_get_af_specific(addr->v4.sin_family);
>>> +               addr_param_len = af->to_addr_param(addr, &addr_param);
>>> +               param.param_hdr.type = SCTP_PARAM_DEL_IP;
>>> +               param.param_hdr.length = htons(paramlen + addr_param_len);
>>> +               param.crr_id = i;
>>> +
>>> +               sctp_addto_chunk(retval, paramlen, &param);
>>> +               sctp_addto_chunk(retval, addr_param_len, &addr_param);
>>> +       }
>> agreed with reusing af instead of defining del_af.  
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 


^ permalink raw reply

* Re: [PATCHv4 2/7] ethtool: Call ethtool's get/set_settings callbacks with cleaned data
From: Stanislaw Gruszka @ 2011-04-28  7:34 UTC (permalink / raw)
  To: David Decotigny
  Cc: Grant Grundler, e1000-devel, linux-kernel, Eilon Greenstein,
	mirq-linux, netdev, Ben Hutchings, David S. Miller
In-Reply-To: <1303965163-8198-3-git-send-email-decot@google.com>

On Wed, Apr 27, 2011 at 09:32:38PM -0700, David Decotigny wrote:
> --- a/drivers/net/stmmac/stmmac_ethtool.c
> +++ b/drivers/net/stmmac/stmmac_ethtool.c
> @@ -237,13 +237,12 @@ stmmac_set_pauseparam(struct net_device *netdev,
>  
>  	if (phy->autoneg) {
>  		if (netif_running(netdev)) {
> -			struct ethtool_cmd cmd;
> +			struct ethtool_cmd cmd = { .cmd = ETHTOOL_SSET };
>  			/* auto-negotiation automatically restarted */
> -			cmd.cmd = ETHTOOL_NWAY_RST;

Why did you change ETHTOOL_NWAY_RST to ETHTOOL_SSET ?


------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* Is 802.3ad mode in bonding useful ?
From: WeipingPan @ 2011-04-28  7:33 UTC (permalink / raw)
  To: netdev

Hi, all,

802.3ad mode in bonding implements 802.3ad standard.

I am just wondering  802.3ad mode is useful,
since  bonding has many modes like balance-rr, active-backup, etc.

thanks
Weiping Pan

^ permalink raw reply

* Re: [PATCH 6/6 v8] sctp: Add ASCONF operation on the single-homed host
From: Wei Yongjun @ 2011-04-28  7:23 UTC (permalink / raw)
  To: Michio Honda; +Cc: netdev, YOSHIFUJI Hideaki
In-Reply-To: <8B2DF56A-11D6-41F6-85E5-01AA47C0F774@sfc.wide.ad.jp>



> Hi, 
>
>>>
>>> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
>>> index 3740603..6363c46 100644
>>> --- a/net/sctp/sm_make_chunk.c
>>> +++ b/net/sctp/sm_make_chunk.c
>>> @@ -2768,6 +2768,12 @@ struct sctp_chunk *sctp_make_asconf_update_ip(struct sctp_association *asoc,
>>> 	int			addr_param_len = 0;
>>> 	int 			totallen = 0;
>>> 	int 			i;
>>> +	sctp_addip_param_t del_param; /* 8 Bytes (Type 0xC002, Len and CrrID) */
>>> +	struct sctp_af *del_af;
>>> +	int del_addr_param_len = 0;
>>> +	int del_paramlen = sizeof(sctp_addip_param_t);
>>> +	union sctp_addr_param del_addr_param; /* (v4) 8 Bytes, (v6) 20 Bytes */
>>> +	int			del_pickup = 0;
>>>
>>> 	/* Get total length of all the address parameters. */
>>> 	addr_buf = addrs;
>>> @@ -2780,6 +2786,17 @@ struct sctp_chunk *sctp_make_asconf_update_ip(struct sctp_association *asoc,
>>> 		totallen += addr_param_len;
>>>
>>> 		addr_buf += af->sockaddr_len;
>>> +		if (asoc->asconf_addr_del_pending && !del_pickup) {
>>> +			if (!sctp_in_scope(asoc->asconf_addr_del_pending,
>>> +			    sctp_scope(addr)))
>>> +				continue;
>> scope should olny be check before address is added to the asoc.
>> delete does not need this, since we have checked that the address
>> be deleted is existed in the asoc.
> Exactly, I remove this check and add the code to check scope for adding address in sctp_asconf_mgmt().  
>> You can test with the last test commands I send to your, instead,
>> ifdown/up the lo device.
>> $ ifdown lo && ifup lo
> Ah, I see, I try.  
>>> +			/* reuse the parameter length from the same scope one */
>>> +			totallen += paramlen;
>>> +			totallen += addr_param_len;
>>> +			del_pickup = 1;
>>> +			asoc->src_out_of_asoc_ok = 1;
>> src_out_of_asoc_ok should be marked when the last address is
>> assigned to asoc->asconf_addr_del_pending?
> I thought marking src_out_of_asoc_ok should be set when the candidate new address appears, rather than when the last address is being deleted.  
> Because until such address appears, there is no situation to send any chunk from the address not in the association.  

The last address have been marked as DEL, will never using
for sending chunks. At this time, there is no valid address in the
host, chunk can not be send out by host.

>>> +			SCTP_DEBUG_PRINTK("mkasconf_update_ip: picked same-scope del_pending addr, totallen for all addresses is %d\n", totallen);
>>> +		}
>>> 	}
>>>
>>> 	/* Create an asconf chunk with the required length. */
>>> @@ -2802,6 +2819,19 @@ struct sctp_chunk *sctp_make_asconf_update_ip(struct sctp_association *asoc,
>>>
>>> 		addr_buf += af->sockaddr_len;
>>> 	}
>>> +	if (flags == SCTP_PARAM_ADD_IP && del_pickup) {
>>> +		addr = asoc->asconf_addr_del_pending;
>>> +		del_af = sctp_get_af_specific(addr->v4.sin_family);
>>> +		del_addr_param_len = del_af->to_addr_param(addr,
>>> +		    &del_addr_param);
>>> +		del_param.param_hdr.type = SCTP_PARAM_DEL_IP;
>>> +		del_param.param_hdr.length = htons(del_paramlen +
>>> +		    del_addr_param_len);
>>> +		del_param.crr_id = i;
>>> +
>>> +		sctp_addto_chunk(retval, del_paramlen, &del_param);
>>> +		sctp_addto_chunk(retval, del_addr_param_len, &del_addr_param);
>>> +	}
>>> 	return retval;
>>> }
>> How about clean up this part as:
>>
>> +       if (...) {
>> +               addr = asoc->asconf_addr_del_pending;
>> +               af = sctp_get_af_specific(addr->v4.sin_family);
>> +               addr_param_len = af->to_addr_param(addr, &addr_param);
>> +               totallen += paramlen;
>> +               totallen += addr_param_len;
>> +		...
>> +       }
>> +
>>       /* Create an asconf chunk with the required length. */
>>       retval = sctp_make_asconf(asoc, laddr, totallen);
>>       if (!retval)
>> @@ -2802,6 +2812,18 @@ struct sctp_chunk *sctp_make_asconf_update_ip(struct sctp_association *asoc,
>>
>>               addr_buf += af->sockaddr_len;
>>       }
>> +
>> +       if (...) {
>> +               addr = asoc->asconf_addr_del_pending;
>> +               af = sctp_get_af_specific(addr->v4.sin_family);
>> +               addr_param_len = af->to_addr_param(addr, &addr_param);
>> +               param.param_hdr.type = SCTP_PARAM_DEL_IP;
>> +               param.param_hdr.length = htons(paramlen + addr_param_len);
>> +               param.crr_id = i;
>> +
>> +               sctp_addto_chunk(retval, paramlen, &param);
>> +               sctp_addto_chunk(retval, addr_param_len, &addr_param);
>> +       }
> agreed with reusing af instead of defining del_af.  
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: [PATCH 6/6 v8] sctp: Add ASCONF operation on the single-homed host
From: Michio Honda @ 2011-04-28  7:04 UTC (permalink / raw)
  To: Wei Yongjun; +Cc: netdev, YOSHIFUJI Hideaki
In-Reply-To: <4DB8F287.2070000@cn.fujitsu.com>

Hi, 

>> 
>> 
>> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
>> index 3740603..6363c46 100644
>> --- a/net/sctp/sm_make_chunk.c
>> +++ b/net/sctp/sm_make_chunk.c
>> @@ -2768,6 +2768,12 @@ struct sctp_chunk *sctp_make_asconf_update_ip(struct sctp_association *asoc,
>> 	int			addr_param_len = 0;
>> 	int 			totallen = 0;
>> 	int 			i;
>> +	sctp_addip_param_t del_param; /* 8 Bytes (Type 0xC002, Len and CrrID) */
>> +	struct sctp_af *del_af;
>> +	int del_addr_param_len = 0;
>> +	int del_paramlen = sizeof(sctp_addip_param_t);
>> +	union sctp_addr_param del_addr_param; /* (v4) 8 Bytes, (v6) 20 Bytes */
>> +	int			del_pickup = 0;
>> 
>> 	/* Get total length of all the address parameters. */
>> 	addr_buf = addrs;
>> @@ -2780,6 +2786,17 @@ struct sctp_chunk *sctp_make_asconf_update_ip(struct sctp_association *asoc,
>> 		totallen += addr_param_len;
>> 
>> 		addr_buf += af->sockaddr_len;
>> +		if (asoc->asconf_addr_del_pending && !del_pickup) {
>> +			if (!sctp_in_scope(asoc->asconf_addr_del_pending,
>> +			    sctp_scope(addr)))
>> +				continue;
> 
> scope should olny be check before address is added to the asoc.
> delete does not need this, since we have checked that the address
> be deleted is existed in the asoc.
Exactly, I remove this check and add the code to check scope for adding address in sctp_asconf_mgmt().  
> 
> You can test with the last test commands I send to your, instead,
> ifdown/up the lo device.
> $ ifdown lo && ifup lo
Ah, I see, I try.  
> 
>> +			/* reuse the parameter length from the same scope one */
>> +			totallen += paramlen;
>> +			totallen += addr_param_len;
>> +			del_pickup = 1;
>> +			asoc->src_out_of_asoc_ok = 1;
> 
> src_out_of_asoc_ok should be marked when the last address is
> assigned to asoc->asconf_addr_del_pending?
I thought marking src_out_of_asoc_ok should be set when the candidate new address appears, rather than when the last address is being deleted.  
Because until such address appears, there is no situation to send any chunk from the address not in the association.  
> 
>> +			SCTP_DEBUG_PRINTK("mkasconf_update_ip: picked same-scope del_pending addr, totallen for all addresses is %d\n", totallen);
>> +		}
>> 	}
>> 
>> 	/* Create an asconf chunk with the required length. */
>> @@ -2802,6 +2819,19 @@ struct sctp_chunk *sctp_make_asconf_update_ip(struct sctp_association *asoc,
>> 
>> 		addr_buf += af->sockaddr_len;
>> 	}
>> +	if (flags == SCTP_PARAM_ADD_IP && del_pickup) {
>> +		addr = asoc->asconf_addr_del_pending;
>> +		del_af = sctp_get_af_specific(addr->v4.sin_family);
>> +		del_addr_param_len = del_af->to_addr_param(addr,
>> +		    &del_addr_param);
>> +		del_param.param_hdr.type = SCTP_PARAM_DEL_IP;
>> +		del_param.param_hdr.length = htons(del_paramlen +
>> +		    del_addr_param_len);
>> +		del_param.crr_id = i;
>> +
>> +		sctp_addto_chunk(retval, del_paramlen, &del_param);
>> +		sctp_addto_chunk(retval, del_addr_param_len, &del_addr_param);
>> +	}
>> 	return retval;
>> }
> 
> How about clean up this part as:
> 
> +       if (...) {
> +               addr = asoc->asconf_addr_del_pending;
> +               af = sctp_get_af_specific(addr->v4.sin_family);
> +               addr_param_len = af->to_addr_param(addr, &addr_param);
> +               totallen += paramlen;
> +               totallen += addr_param_len;
> +		...
> +       }
> +
>       /* Create an asconf chunk with the required length. */
>       retval = sctp_make_asconf(asoc, laddr, totallen);
>       if (!retval)
> @@ -2802,6 +2812,18 @@ struct sctp_chunk *sctp_make_asconf_update_ip(struct sctp_association *asoc,
> 
>               addr_buf += af->sockaddr_len;
>       }
> +
> +       if (...) {
> +               addr = asoc->asconf_addr_del_pending;
> +               af = sctp_get_af_specific(addr->v4.sin_family);
> +               addr_param_len = af->to_addr_param(addr, &addr_param);
> +               param.param_hdr.type = SCTP_PARAM_DEL_IP;
> +               param.param_hdr.length = htons(paramlen + addr_param_len);
> +               param.crr_id = i;
> +
> +               sctp_addto_chunk(retval, paramlen, &param);
> +               sctp_addto_chunk(retval, addr_param_len, &addr_param);
> +       }
agreed with reusing af instead of defining del_af.  
> 


^ permalink raw reply

* Re: Maximum no of bytes Ethernet can transfer at a time ??
From: Eric Dumazet @ 2011-04-28  6:40 UTC (permalink / raw)
  To: Ajit; +Cc: netdev
In-Reply-To: <loom.20110428T083015-693@post.gmane.org>

Le jeudi 28 avril 2011 à 06:36 +0000, Ajit a écrit :
> Ajit <ajitsa_bes <at> yahoo.com> writes:
> 
> > 
> > Eric Dumazet <eric.dumazet <at> gmail.com> writes:
> > 
> > > Sure, check your syscall returns values, and search for SO_RCVBUF &
> > > SO_SNDBUF   (man 7 socket)
> > > 
> > > --
> > 
> > okies..I dont know exactly how to use those but I will google and try it..
> > will post my result after some time.
> > 
> > Thank you
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo <at> vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> 
> 
> hi sir,
> I tried out something as you said.
> 
> I introduces this lines in my code,
> 
> getsockopt(s,SOL_SOCKET,SO_RCVBUF,&optval,&optlen);
> printf("The value of optlen is %d\n",optlen);
> 

Oh well, netdev is really not the place to discussion like that.

int optval;
socklen_t optlen = sizeof(optval);
getsockopt(s,SOL_SOCKET,SO_RCVBUF, &optval, &optlen);
printf("The value of optval is %d\n", optval);

> It always displays 4.
> 
> I really have no idea about it. Do I need to use setsockopt options in my
> receiver code ??
> If so, what difference will it make ??

It makes sure your socket can really receive enough messages (or kernel
drops them)




^ permalink raw reply

* Re: Maximum no of bytes Ethernet can transfer at a time ??
From: Ajit @ 2011-04-28  6:36 UTC (permalink / raw)
  To: netdev
In-Reply-To: <loom.20110427T142551-736@post.gmane.org>

Ajit <ajitsa_bes <at> yahoo.com> writes:

> 
> Eric Dumazet <eric.dumazet <at> gmail.com> writes:
> 
> > Sure, check your syscall returns values, and search for SO_RCVBUF &
> > SO_SNDBUF   (man 7 socket)
> > 
> > --
> 
> okies..I dont know exactly how to use those but I will google and try it..
> will post my result after some time.
> 
> Thank you
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo <at> vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

hi sir,
I tried out something as you said.

I introduces this lines in my code,

getsockopt(s,SOL_SOCKET,SO_RCVBUF,&optval,&optlen);
printf("The value of optlen is %d\n",optlen);

It always displays 4.

I really have no idea about it. Do I need to use setsockopt options in my
receiver code ??
If so, what difference will it make ??

^ permalink raw reply

* Re: [PATCH 08/13] netvm: Allow skb allocation to use PFMEMALLOC reserves
From: NeilBrown @ 2011-04-28  6:19 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Peter Zijlstra
In-Reply-To: <1303920491-25302-9-git-send-email-mgorman@suse.de>

On Wed, 27 Apr 2011 17:08:06 +0100 Mel Gorman <mgorman@suse.de> wrote:


> @@ -1578,7 +1589,7 @@ static inline struct sk_buff *netdev_alloc_skb_ip_align(struct net_device *dev,
>   */
>  static inline struct page *__netdev_alloc_page(struct net_device *dev, gfp_t gfp_mask)
>  {
> -	return alloc_pages_node(NUMA_NO_NODE, gfp_mask, 0);
> +	return alloc_pages_node(NUMA_NO_NODE, gfp_mask | __GFP_MEMALLOC, 0);
>  }
>  

I'm puzzling a bit over this change.
__netdev_alloc_page appears to be used to get pages to put in ring buffer
for a network card to DMA received packets into.  So it is OK to use
__GFP_MEMALLOC for these allocations providing we mark the resulting skb as
'pfmemalloc' if a reserved page was used.

However I don't see where that marking is done.
I think it should be in skb_fill_page_desc, something like:

  if (page->pfmemalloc)
	skb->pfmemalloc = true;

Is this covered somewhere else that I am missing?

Thanks,
NeilBrown

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox