Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next-2.6] sfq: fix sfq stats handling
From: Eric Dumazet @ 2010-12-22  6:14 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Jarek Poplawski

David, since this one fixes a bug, could you apply it before the pending
SFQ patch (sch_sfq: allow big packets and be fair), I'll respin this one
if necessary.

Thanks

[PATCH net-next-2.6] sfq: fix sfq class stats handling

sfq_walk() runs without qdisc lock. By the time it selects a non empty
hash slot and sfq_dump_class_stats() is run (with lock held), slot might
have been freed : We then access q->slots[SFQ_EMPTY_SLOT], out of
bounds, and crash in slot_queue_walk()

On previous kernels, bug is here but out of bounds qs[SFQ_DEPTH] and
allot[SFQ_DEPTH] are located in struct sfq_sched_data, so no illegal
memory access happens, only possibly wrong data reported to user.

Also, slot_dequeue_tail() should make sure slot skb chain is correctly
terminated, or sfq_dump_class_stats() can access freed skbs.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jarek Poplawski <jarkao2@gmail.com>
---
 net/sched/sch_sfq.c |   16 +++++++++++-----
 1 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 13322e8..6a2f88f 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -281,6 +281,7 @@ static inline struct sk_buff *slot_dequeue_tail(struct sfq_slot *slot)
 	struct sk_buff *skb = slot->skblist_prev;
 
 	slot->skblist_prev = skb->prev;
+	skb->prev->next = (struct sk_buff *)slot;
 	skb->next = skb->prev = NULL;
 	return skb;
 }
@@ -608,14 +609,19 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 				struct gnet_dump *d)
 {
 	struct sfq_sched_data *q = qdisc_priv(sch);
-	const struct sfq_slot *slot = &q->slots[q->ht[cl - 1]];
-	struct gnet_stats_queue qs = { .qlen = slot->qlen };
-	struct tc_sfq_xstats xstats = { .allot = slot->allot };
+	sfq_index idx = q->ht[cl - 1];
+	struct gnet_stats_queue qs = { 0 };
+	struct tc_sfq_xstats xstats = { 0 };
 	struct sk_buff *skb;
 
-	slot_queue_walk(slot, skb)
-		qs.backlog += qdisc_pkt_len(skb);
+	if (idx != SFQ_EMPTY_SLOT) {
+		const struct sfq_slot *slot = &q->slots[idx];
 
+		xstats.allot = slot->allot;
+		qs.qlen = slot->qlen;
+		slot_queue_walk(slot, skb)
+			qs.backlog += qdisc_pkt_len(skb);
+	}
 	if (gnet_stats_copy_queue(d, &qs) < 0)
 		return -1;
 	return gnet_stats_copy_app(d, &xstats, sizeof(xstats));



^ permalink raw reply related

* how to make sure that packet is inserted into PACKET_MMAP ring buffer strictly according to its timestamp order?
From: Rui @ 2010-12-22  4:33 UTC (permalink / raw)
  To: netdev

hi

how to make sure that packet is inserted into PACKET_MMAP ring buffer
strictly according to its timestamp order?

I am studying the code below
as the function tpacket_rcv will be running in parallel in multi queue
nic supportted kernel,
is it possible that a later packet get the spinlock earlier,
thus it is copied to the ring buffer prior to packet which timestamp is earlier?


thanks
rui

af_packet.c
static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
		       struct packet_type *pt, struct net_device *orig_dev)
{
...
spin_lock(&sk->sk_receive_queue.lock);
	h.raw = packet_current_frame(po, &po->rx_ring, TP_STATUS_KERNEL);
	if (!h.raw)
		goto ring_is_full;
	packet_increment_head(&po->rx_ring);
	po->stats.tp_packets++;
	if (copy_skb) {
		status |= TP_STATUS_COPY;
		__skb_queue_tail(&sk->sk_receive_queue, copy_skb);
	}
	if (!po->stats.tp_drops)
		status &= ~TP_STATUS_LOSING;
	spin_unlock(&sk->sk_receive_queue.lock);

	skb_copy_bits(skb, 0, h.raw + macoff, snaplen);
}

^ permalink raw reply

* Re: [PATCH] MAINTAINERS: email address change
From: David Miller @ 2010-12-22  3:58 UTC (permalink / raw)
  To: pcnet32; +Cc: netdev
In-Reply-To: <1292988905.9055.14.camel@Linux>

From: Don Fry <pcnet32@frontier.com>
Date: Tue, 21 Dec 2010 19:35:05 -0800

> My ISP has changed and therefore my email address.
> 
> Signed-off-by: Don Fry <pcnet32@frontier.com>

Applied, thanks.

^ permalink raw reply

* [PATCH] MAINTAINERS:  email address change
From: Don Fry @ 2010-12-22  3:35 UTC (permalink / raw)
  To: David Miller, netdev

My ISP has changed and therefore my email address.

Signed-off-by:  Don Fry <pcnet32@frontier.com>

--- linux-2.6.37-rc7/MAINTAINERS.orig	2010-12-21 11:26:40.000000000 -0800
+++ linux-2.6.37-rc7/MAINTAINERS	2010-12-21 19:20:37.304116312 -0800
@@ -4590,7 +4590,7 @@ F:	drivers/pcmcia/
 F:	include/pcmcia/
 
 PCNET32 NETWORK DRIVER
-M:	Don Fry <pcnet32@verizon.net>
+M:	Don Fry <pcnet32@frontier.com>
 L:	netdev@vger.kernel.org
 S:	Maintained
 F:	drivers/net/pcnet32.c



^ permalink raw reply

* RE: [PATCH net-next-2.6 v2 1/1] can: c_can: Added support for Bosch C_CAN controller
From: Bhupesh SHARMA @ 2010-12-22  3:36 UTC (permalink / raw)
  To: Wolfgang Grandegger
  Cc: Socketcan-core-0fE9KPoRgkgATYTw5x5z8w@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Marc Kleine-Budde
In-Reply-To: <4D10FF97.1070703-5Yr1BZd7O62+XT7JhA+gdA@public.gmane.org>

Hi Wolfgang,

> Hi Bhupesh,
> 
> On 12/21/2010 05:48 AM, Bhupesh SHARMA wrote:
> > Hi Wolfgang,
> ...
> >> In the meantime I compared the CAN chapter of the PCH manual with
> the
> >> C_CAN manual. The paragraphs I checked are *identical*. This makes
> >> clear, that the "pch_can" is a clone of the  C_CAN CAN controller,
> with
> >> a few extensions, though. Therefore it would make sense, to
> implement a
> >> bus sensitive interface like for the SJA1000 allowing to handle both
> >> CAN
> >> controllers with one driver sooner than later. Therefore, could you
> >> please implement:
> >>
> >>   drivers/net/can/c_can/c_can.c
> >>                        /c_can_platform.c
> >>
> >> Then an interface to the PCI based PCH CAN controller could be added
> >> easily, e.g. as "pch_pci.c". You already had something similar in
> your
> >> RFC version of the patch, IIRC.
> >
> > This was the approach I initially proposed in my RFC V1 patch :)
> > But unfortunately we could not agree to it.
> 
> I know. But at that time I was not aware of any other bus used for the
> C_CAN controller.
> 
> > So, please let me reiterate what I understood and what was present
> > in RFC version of the patch. Please add your comments/views:
> >
> >         - drivers/net/can/c_can/c_can.c (similar on lines of
> sja1000.c)
> >         i.e. a)no *probe* / *remove* functions here,
> >              b)register read/write implemented here.
> >
> >         - drivers/net/can/c_can/c_can_platform.c (similar on lines of
> sja1000_platform.c)
> >         i.e. *probe* / *remove* implemented here,
> 
> Yes, that's what I'm thinking about.
> 
> > Marc and Tomoya can also add their suggestions so that I can finalize
> V3 a.s.a.p.
> 
> That would be nice, indeed. Also have a look to Tomoya's PCH driver,
> which also looks very good in the meantime.

I am having a look at Tomoya's PCH driver, but as I mentioned in 
RFC V1 patch, I would rather like to have a bus sensitive `c_can` driver
on top of which we can have the platform driver `c_can_platform` which
essentially caters to the details of registers mapping/arch differences.
Any other functionality like USB/PCI should be present in a separate file
like `usb_c_can.c` or `pci_c_can.c` 

If you agree I will try to circulate V3 a.s.ap.

Regards,
Bhupesh

^ permalink raw reply

* Re: [PATCH 5/5 v4] net: add old_queue_mapping into skb->cb
From: Changli Gao @ 2010-12-22  0:08 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: hadi, David S. Miller, Stephen Hemminger, Tom Herbert, Jiri Pirko,
	netdev, netem
In-Reply-To: <1292945075.2720.32.camel@edumazet-laptop>

On Tue, Dec 21, 2010 at 11:24 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Do we really want a multi queue ifb at all ?
>
> Why not use percpu data and LLTX, like we did for other virtual devices
> (loopback, tunnels, vlans, ...)
>
> I guess most ifb uses need to finaly deliver packets in a monoqueue
> anyway, optimizing ifb might raise lock contention on this resource.
>

For ingress shaping, the later processing should be in parallel, so
multiple ri_tasklets are needed.

For one queue ifb, the queue mapping should be saved and restored for
ingress skbs, as it is reset to 0 in dev_pick_tx(ifb).


-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [patch -next] pch_can: off by one bugs
From: Tomoya MORINAGA @ 2010-12-21 23:58 UTC (permalink / raw)
  To: Wolfgang Grandegger, Dan Carpenter
  Cc: socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	kernel-janitors-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <4D111110.2030609@grandegger.com>

On Wednesday, December 22, 2010 5:41 AM, Wolfgang Grandegger wrote:
> This fix does not look correct too me.
I agree with Wolfgang's comments.

> Dan
Thank you for your report.
I will post the patch by this week.

Thanks,
---
Tomoya MORINAGA
OKI SEMICONDUCTOR CO., LTD.

----- Original Message ----- 
From: "Wolfgang Grandegger" <wg-5Yr1BZd7O62+XT7JhA+gdA@public.gmane.org>
To: "Dan Carpenter" <error27-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: <socketcan-core-0fE9KPoRgkgATYTw5x5z8w@public.gmane.org>; <netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>; 
<kernel-janitors-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>; "Tomoya MORINAGA" 
<tomoya-linux-ECg8zkTtlr0C6LszWs/t0g@public.gmane.org>
Sent: Wednesday, December 22, 2010 5:41 AM
Subject: Re: [patch -next] pch_can: off by one bugs


> Hello,
>
> On 12/20/2010 10:26 AM, Dan Carpenter wrote:
>> priv->tx_enable[] has PCH_TX_OBJ_END elements so this code is
>> reading and writing one past the end of the array.
>>
>> Signed-off-by: Dan Carpenter <error27-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>>
>> diff --git a/drivers/net/can/pch_can.c b/drivers/net/can/pch_can.c
>> index 8d45fdd..b2c1292 100644
>> --- a/drivers/net/can/pch_can.c
>> +++ b/drivers/net/can/pch_can.c
>> @@ -1077,7 +1077,7 @@ static int pch_can_suspend(struct pci_dev *pdev, 
>> pm_message_t state)
>>  pch_can_set_int_enables(priv, PCH_CAN_DISABLE);
>>
>>  /* Save Tx buffer enable state */
>> - for (i = PCH_TX_OBJ_START; i <= PCH_TX_OBJ_END; i++)
>> + for (i = PCH_TX_OBJ_START; i < PCH_TX_OBJ_END; i++)
>>  priv->tx_enable[i] = pch_can_get_rxtx_ir(priv, i, PCH_TX_IFREG);
>>
>>  /* Disable all Transmit buffers */
>> @@ -1138,7 +1138,7 @@ static int pch_can_resume(struct pci_dev *pdev)
>>  pch_can_set_optmode(priv);
>>
>>  /* Enabling the transmit buffer. */
>> - for (i = PCH_TX_OBJ_START; i <= PCH_TX_OBJ_END; i++)
>> + for (i = PCH_TX_OBJ_START; i < PCH_TX_OBJ_END; i++)
>>  pch_can_set_rxtx(priv, i, priv->tx_enable[i], PCH_TX_IFREG);
>>
>>  /* Configuring the receive buffer and enabling them. */
>>
>
> This fix does not look correct too me. There are much more loop using
> "i <= PCH_TX_OBJ_END" and the message numbering is from 1..32. Therefore
> using "priv->tx_enable[i - 1]" seems more appropriate to me. Tomaya,
> could you please check.
>
> Thanks,
>
> Wolfgang.
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* [PATCH] nfs: fix mispelling of idmap CONFIG symbol
From: J. Bruce Fields @ 2010-12-21 23:49 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Kirill A. Shutemov, Neil Brown, Pavel Emelyanov, linux-nfs,
	David S. Miller, netdev, linux-kernel
In-Reply-To: <1292975008.16674.2.camel@heimdal.trondhjem.org>

From: J. Bruce Fields <bfields@redhat.com>

Trivial, but confusing when you're trying to grep through this
code....

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 fs/nfs/idmap.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/idmap.c b/fs/nfs/idmap.c
index 4e2d9b6..1869688 100644
--- a/fs/nfs/idmap.c
+++ b/fs/nfs/idmap.c
@@ -238,7 +238,7 @@ int nfs_map_gid_to_group(struct nfs_client *clp, __u32 gid, char *buf, size_t bu
 	return nfs_idmap_lookup_name(gid, "group", buf, buflen);
 }
 
-#else  /* CONFIG_NFS_USE_IDMAPPER not defined */
+#else  /* CONFIG_NFS_USE_NEW_IDMAPPER not defined */
 
 #include <linux/module.h>
 #include <linux/mutex.h>
-- 
1.7.1


^ permalink raw reply related

* Re: [PATCH] msm: rmnet: msm rmnet smd virtual ethernet driver
From: nvishwan @ 2010-12-21 23:45 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Niranjana Vishwanathapura, netdev, linux-kernel, linux-arm-msm,
	linux-arm-kernel, Brian Swetland
In-Reply-To: <20101215134406.3a634aa8@nehalam>

> On Wed, 15 Dec 2010 10:31:06 -0800
> Niranjana Vishwanathapura <nvishwan@codeaurora.org> wrote:
>
>> +
>> +static DECLARE_TASKLET(smd_net_data_tasklet, smd_net_data_handler, 0);
>> +
>> +static void smd_net_notify(void *_dev, unsigned event)
>> +{
>> +	if (event != SMD_EVENT_DATA)
>> +		return;
>> +
>> +	smd_net_data_tasklet.data = (unsigned long) _dev;
>> +
>> +	tasklet_schedule(&smd_net_data_tasklet);
>> +}
>> +
>
> Rather than having private tasklet, maybe using NAPI
> would be better?
>
> Also since you are already in tasklet, no need to call netif_rx()
> when receiving packet; instead use netif_receive_skb directly.
>
> --
>

NAPI will not buy much as the SMD transport doesn't provide a machanism to
stop interrupts.  I will consider using NAPI in the future (it requires
performance testing on lot of different targets).

However, I can replace netif_rx() by netif_receive_skb() and send out new
patch


^ permalink raw reply

* Re: [PATCH 00/12] make rpc_pipefs be mountable multiple times
From: J. Bruce Fields @ 2010-12-21 23:45 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Trond Myklebust, Neil Brown, Pavel Emelyanov, linux-nfs,
	David S. Miller, netdev, linux-kernel
In-Reply-To: <20101221233215.GC7092@shutemov.name>

On Wed, Dec 22, 2010 at 01:32:15AM +0200, Kirill A. Shutemov wrote:
> On Mon, Dec 20, 2010 at 09:46:44AM -0500, J. Bruce Fields wrote:
> > By the way, was there ever a resolution to Trond's question?:
> > 
> > 	http://marc.info/?l=linux-nfs&m=128655758712817&w=2
> > 
> > 	"The keyring upcalls are currently initiated through the same
> > 	mechanism as module_request and therefore get started with the
> > 	init_nsproxy namespace. We'd really like them to run inside the
> > 	same container as the process.  As part of the same problem,
> > 	there is the issue of what to do with the dns resolver and
> > 	Bryan's new keyring based idmapper code."
> 
> I'm not sure that I understand the problem correctly.
> 
> Currently, idmap uses dentry taken from client's cl_rpcclient->cl_path
> (see nfs_idmap_new()). cl_rpcclient (and cl_path) is initialized with
> rpcmount resolved against mount namespace of mount process (see
> nfs_create_rpc_client()).
> I assume it's correct.

There's actually two separate sets of idmapper code; look at
fs/nfs/idmapper.c, the first part of the file (between #ifdef
CONFIG_NFS_USE_NEW_IDMAPPER and #else) is idmapping code that uses
request_key().  The code you're looking at (including nfs_idmap_new())
is later in the file, and deprecated.

--b.

^ permalink raw reply

* Re: [PATCH] msm: rmnet: msm rmnet smd virtual ethernet driver
From: nvishwan @ 2010-12-21 23:45 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Niranjana Vishwanathapura, netdev, linux-kernel, linux-arm-msm,
	linux-arm-kernel, Brian Swetland
In-Reply-To: <20101215134135.51df5a7f@nehalam>

> On Wed, 15 Dec 2010 10:31:06 -0800
> Niranjana Vishwanathapura <nvishwan@codeaurora.org> wrote:
>
>> +
>> +static void rmnet_tx_timeout(struct net_device *dev)
>> +{
>> +	pr_info("rmnet_tx_timeout()\n");
>> +}
>
> The purpose of having a tx_timeout is to clear the pending
> request.
>
> --
>

Currently, there is no mechanism to clear the transaction.  This message
is helpful in identifying the issue and fixing it.

^ permalink raw reply

* Re: [PATCH] msm: rmnet: msm rmnet smd virtual ethernet driver
From: nvishwan @ 2010-12-21 23:44 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Niranjana Vishwanathapura, netdev, linux-kernel, linux-arm-msm,
	linux-arm-kernel, Brian Swetland
In-Reply-To: <20101215134107.0354c0c3@nehalam>

> On Wed, 15 Dec 2010 10:31:06 -0800
> Niranjana Vishwanathapura <nvishwan@codeaurora.org> wrote:
>
>> +static struct net_device_stats *rmnet_get_stats(struct net_device *dev)
>> +{
>> +	struct rmnet_private *p = netdev_priv(dev);
>> +	return &p->stats;
>> +}
>> +
>
> Just use dev->stats and this function would no longer be needed.
>
> --
>

This function will be a function pointer (ndo_get_stats) in net_device_ops
structure.

^ permalink raw reply

* Re: [PATCH] msm: rmnet: msm rmnet smd virtual ethernet driver
From: nvishwan @ 2010-12-21 23:44 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Niranjana Vishwanathapura, netdev, linux-kernel, linux-arm-msm,
	linux-arm-kernel, Brian Swetland
In-Reply-To: <201012152204.42002.arnd@arndb.de>

> On Wednesday 15 December 2010 19:31:06 Niranjana Vishwanathapura wrote:
>> +struct rmnet_private {
>> +       smd_channel_t *ch;
>> +       struct net_device_stats stats;
>> +       const char *chname;
>> +#ifdef CONFIG_MSM_RMNET_DEBUG
>> +       ktime_t last_packet;
>> +       short active_countdown; /* Number of times left to check */
>> +       short restart_count; /* Number of polls seems so far */
>> +       unsigned long wakeups_xmit;
>> +       unsigned long wakeups_rcv;
>> +       unsigned long timeout_us;
>> +       unsigned long awake_time_ms;
>> +       struct delayed_work work;
>> +#endif
>> +};
>
> It feels like a significant portion of the code and the complexity
> (of which there fortunately is very little otherwise) is in the
> debugging code. How important is that debugging code still?
>
> In my experience, once a driver gets stable enough for inclusion,
> most of the debugging code that you have put in there to write
> the driver becomes obsolete, because the next bug is not going
> to be found with it anyway.
>
> How about deleting the debug code now? You still have the code and
> if something goes wrong, you can always put it back to analyse
> the problem.
>

These debug code proved to be very helpful in debugging with different
baseband images on different targets.  So, it would be very handy to keep it.

But, let me remove it for now, I can send another patch later for debug code.

>> +/* Called in soft-irq context */
>> +static void smd_net_data_handler(unsigned long arg)
>> +{
>> ...
>> +}
>> +
>> +static DECLARE_TASKLET(smd_net_data_tasklet, smd_net_data_handler, 0);
>> +
>> +static void smd_net_notify(void *_dev, unsigned event)
>> +{
>> +       if (event != SMD_EVENT_DATA)
>> +               return;
>> +
>> +       smd_net_data_tasklet.data = (unsigned long) _dev;
>> +
>> +       tasklet_schedule(&smd_net_data_tasklet);
>> +}
>
> It appears strange to do all the receive work in a tasklet. The
> common networking code already has infrastructure for deferring
> the rx to softirq time, and using the NAPI poll() logic likely gives
> you better performance as well.
>

NAPI will not buy much as the SMD transport doesn't provide a machanism to
stop interrupts.  I will consider using NAPI in the future (it requires
performance testing on lot of different targets).

> Aside from these, the driver looks very nice and clean to me.
>
> 	Arnd
>



^ permalink raw reply

* Re: [PATCH 00/12] make rpc_pipefs be mountable multiple times
From: Trond Myklebust @ 2010-12-21 23:43 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: J. Bruce Fields, Neil Brown, Pavel Emelyanov,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, David S. Miller,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20101221233215.GC7092-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>

On Wed, 2010-12-22 at 01:32 +0200, Kirill A. Shutemov wrote:
> On Mon, Dec 20, 2010 at 09:46:44AM -0500, J. Bruce Fields wrote:
> > By the way, was there ever a resolution to Trond's question?:
> > 
> > 	http://marc.info/?l=linux-nfs&m=128655758712817&w=2
> > 
> > 	"The keyring upcalls are currently initiated through the same
> > 	mechanism as module_request and therefore get started with the
> > 	init_nsproxy namespace. We'd really like them to run inside the
> > 	same container as the process.  As part of the same problem,
> > 	there is the issue of what to do with the dns resolver and
> > 	Bryan's new keyring based idmapper code."
> 
> I'm not sure that I understand the problem correctly.
> 
> Currently, idmap uses dentry taken from client's cl_rpcclient->cl_path
> (see nfs_idmap_new()). cl_rpcclient (and cl_path) is initialized with
> rpcmount resolved against mount namespace of mount process (see
> nfs_create_rpc_client()).
> I assume it's correct.

That would be the legacy idmapper mechanism.

Please see CONFIG_NFS_USE_NEW_IDMAPPER, which uses the keyring upcall
mechanism for greater scalability.


Cheers
  Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org
www.netapp.com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 00/12] make rpc_pipefs be mountable multiple times
From: Kirill A. Shutemov @ 2010-12-21 23:32 UTC (permalink / raw)
  To: J. Bruce Fields, Trond Myklebust
  Cc: Neil Brown, Pavel Emelyanov, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	David S. Miller, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20101220144644.GC20643-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>

On Mon, Dec 20, 2010 at 09:46:44AM -0500, J. Bruce Fields wrote:
> On Mon, Dec 20, 2010 at 01:54:26PM +0200, Kirill A. Shutsemov wrote:
> > From: Kirill A. Shutemov <kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
> > 
> > Prepare nfs/sunrpc stack to use multiple instances of rpc_pipefs.
> > Only for client for now.
> > 
> > Only quick sanity check was made. BTW, is there any check list for NFS
> > contributors?
> 
> Nothing formal.
> 
> Chuck may have some nlm patches that conflict; I'm not sure what their
> status is.
> 
> For testing rpc_pipefs, maybe a few mounts (simultaneous mounts might be
> good), possibly over different security flavors, and with some of them
> nfsv4, might be a good idea.

Ok, I'll play with it.

> By the way, was there ever a resolution to Trond's question?:
> 
> 	http://marc.info/?l=linux-nfs&m=128655758712817&w=2
> 
> 	"The keyring upcalls are currently initiated through the same
> 	mechanism as module_request and therefore get started with the
> 	init_nsproxy namespace. We'd really like them to run inside the
> 	same container as the process.  As part of the same problem,
> 	there is the issue of what to do with the dns resolver and
> 	Bryan's new keyring based idmapper code."

I'm not sure that I understand the problem correctly.

Currently, idmap uses dentry taken from client's cl_rpcclient->cl_path
(see nfs_idmap_new()). cl_rpcclient (and cl_path) is initialized with
rpcmount resolved against mount namespace of mount process (see
nfs_create_rpc_client()).
I assume it's correct.

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] ppp: allow disabling multilink protocol ID compression
From: Paul Mackerras @ 2010-12-21 22:44 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: linux-ppp, netdev
In-Reply-To: <20101220195833.26786ec5@nehalam>

On Mon, Dec 20, 2010 at 07:58:33PM -0800, Stephen Hemminger wrote:

> Linux would not connect to other router running old version Cisco IOS (12.0).
> This is most likely a bug in that version of IOS, since it is fixed
> in later versions. As a workaround this patch allows a module parameter
> to be set to disable compressing the protocol ID.
> 
> See: https://bugzilla.vyatta.com/show_bug.cgi?id=3979
> 
> RFC 1990 allows an implementation to formulate MP fragments as if protocol
> compression had been negotiated.  This allows us to always send compressed
> protocol IDs.  But some implementations don't accept MP fragments with
> compressed protocol IDs.  This parameter allows us to interoperate with
> them.  The default value of the configurable parameter is the same as the
> current behavior:  protocol compression is enabled.  If protocol compression
> is disabled we will not send compressed protocol IDs.
> 
> This is based on an earlier patch by Bob Gilligan (using a sysctl).
> Module parameter is writable to allow for enabling even if ppp
> is already loaded for other uses.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Acked-by: Paul Mackerras <paulus@samba.org>

^ permalink raw reply

* Re: [PATCH net-2.6] net_sched: always clone skbs
From: Jarek Poplawski @ 2010-12-21 22:37 UTC (permalink / raw)
  To: Changli Gao; +Cc: hadi, Eric Dumazet, David S. Miller, netdev, Pawel Staszewski
In-Reply-To: <AANLkTikwMQBwNdT1nZ97NNFYGJnwhk39+8afTKONgeF5@mail.gmail.com>

On Tue, Dec 21, 2010 at 10:17:35PM +0800, Changli Gao wrote:
> On Tue, Dec 21, 2010 at 9:52 PM, jamal <hadi@cyberus.ca> wrote:
> >
> > Use to be we couldnt get the ifb+mirred combo to work
> > with TSO even without this change - i will have to dig old emails
> > to remember details. So we are making progress;-> I did write a set
> > of rules in: Documentation/networking/tc-actions-env-rules.txt
> > Changli optimized rule #1. When i looked at his patch it seems
> > to not harm that case. Sometimes dumb is a good principle ;->

Actually, when dumb isn't a good principle? ;->

Speaking about wrong things like optimizing this 1st commandment: it
seems, if we're stealing not shared skb it's not unreasonable to tell
others not to touch it anymore, and skip kfree_skb() in handle_ing(),
unless I miss something?


> In order to make my trick work. We need to assure dev_queue_xmit() and
> dev_hard_start_xmit() accept shared skbs. As Eric pointed, pktgen also
> need dev->netdev_ops->ndo_start_xmit() accept shared skbs. We need to
> fix every ndo_start_xmit() one by one, then dev_hard_start_xmit(), and
> when dev_queue_xmit() is also fixed, my trick can be added back.
> :)

I'm not sure pktgen is right - even if it's safe in its case, it seems
to break some older rules, and drivers should really own the things.
So it needs reviewing first.

Cheers,
Jarek P.

^ permalink raw reply

* Re: [PATCH V7 1/8] ntp: add ADJ_SETOFFSET mode bit
From: john stultz @ 2010-12-21 22:25 UTC (permalink / raw)
  To: Kuwahara,T.
  Cc: Richard Cochran, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Alan Cox, Arnd Bergmann, Christoph Lameter, David Miller,
	Krzysztof Halasa, Peter Zijlstra, Rodolfo Giometti,
	Thomas Gleixner
In-Reply-To: <AANLkTimJd5pScKTiDxuYd-h+NevYbCusyvUAqmUDXe8h-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Wed, 2010-12-22 at 05:57 +0900, Kuwahara,T. wrote:

> How about this?
> 
> if (txc->modes & ADJ_OFFSET) {
> 	if (txc->constant == INT32_MIN) {
> 		/* step time */
> 	} else {
> 		/* slew time */
> 	}
> }

This looks like magic behavior. Sort of a "knock twice and then say the
password" interface. I don't see why that would be better then adding a
clear new mode flag?


> That said, I'm somehow against the idea of using the adjtimex syscall
> for that purpose.

Could you expand on why you don't like this?

thanks
-john

^ permalink raw reply

* Re: [PATCH V7 1/8] ntp: add ADJ_SETOFFSET mode bit
From: john stultz @ 2010-12-21 21:59 UTC (permalink / raw)
  To: Kuwahara,T., Alexander Gordeev, Rodolfo Giometti
  Cc: Richard Cochran, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Alan Cox, Arnd Bergmann, Christoph Lameter, David Miller,
	Krzysztof Halasa, Peter Zijlstra, Rodolfo Giometti,
	Thomas Gleixner
In-Reply-To: <AANLkTikHV-qN73Zvvvxn76vtFFKG_VNTQhF3qpqTnOrE-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Wed, 2010-12-22 at 06:13 +0900, Kuwahara,T. wrote:
> On Wed, Dec 22, 2010 at 4:37 AM, john stultz <johnstul-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> wrote:
> > adjtimex is a linux specific interface, which is compatible but not
> > identical to the ntp specified interfaces. The ntp client code already
> > has Linux specific modifications, so I don't think we have to worry
> > about 0x40 specifically being reserved by the NTP client.
> 
> But struct timex is not linux-specific...

It is if you're compiling against linux's timex.h file. 

We already have a number of differences compared with BSD's timex mode
definitions:
We have ADJ_TICK: 0x4000, which is MOD_CLKB in FreeBSD.
We also have ADJ_OFFSET_SINGLESHOT and ADJ_OFFSET_SS_READ which allow
adjtimex act like the original ntp_adjtime.

The key bit is that we map the shared MOD_* definitions that the NTP
client uses to the linux specific ADJ_* values in the linux timex.h

However, your concern does bring up a good point: 0x40 is MOD_PPSMAX in
BSD, and we should at-least check to make sure that the PPS code that is
currently floating around on the lists and is in akpm's tree hasn't
already reserved that bit.

Rodolfo, Alexander: Any comments here?

thanks
-john

^ permalink raw reply

* Re: [RFC] ipv4: add ICMP socket kind
From: Solar Designer @ 2010-12-21 21:32 UTC (permalink / raw)
  To: Colin Walters; +Cc: Vasiliy Kulikov, netdev, linux-kernel, Pavel Kankovsky
In-Reply-To: <AANLkTin6_fck1SbCbX5tLvY8CPzJstqmpkJuVF-B6oC6@mail.gmail.com>

On Tue, Dec 21, 2010 at 03:46:15PM -0500, Colin Walters wrote:
> On Tue, Dec 21, 2010 at 2:46 PM, Solar Designer <solar@openwall.com> wrote:
> > We intend to have this sysctl'able and to have it restricted to a group
> > by default (the sysctl would set the GID) on our Linux distro,
> > Openwall GNU/*/Linux.  However, we figured that it'd be tough for us to
> > get this complication accepted into mainstream, so we opted to have the
> > patch posted for comment without it.
> 
> Right, a sysctl was the obvious thing to have.

If others don't mind either, we'd be very happy for this to get in.
Vasiliy can include it in the next revision of the patch he posts.

> >> If you really have a burning desire to get rid of setuid /bin/ping,
> >> why not just do it in userspace via message passing to/from a
> >> privileged process, and avoid a lot of code in the kernel?
> >
> > Yes, we thought of that, and we don't like this solution.
> 
> ...because?

The helper daemon process would need to be constantly running, and it
would consume somewhat more resources than just a piece of extra kernel
code would.  Its complexity would be slightly higher.  It may get
OOM-killed or the like (the cause may be external to it), after which
point the functionality will disappear.  We use OpenVZ containers
heavily, and we don't want to run an instance of such process per
container (up to a few hundred per system).  We also don't want to
introduce a container to host channel for this (would be a hack).  We
already have the process to kernel interface, which works equally well
from containers, so we prefer to just use that.

> > We similarly
> > (but for different reasons) don't like using fscaps to grant CAP_NET_RAW
> > to ping.
> 
> I am (learning now) about the fscaps drawbacks...

Here's some info on the topic:

http://www.openwall.com/lists/oss-security/2010/11/08/3

You may also want to read the follow-ups ("thread-next").  Unfortunately,
I failed to find time to continue that discussion thread myself, although
I still hope to.

> > We share your concern about the size of net/ipv4/ping.c introduced by
> > this patch, yet this is our current proposal.
> 
> To be clear I have no personal stake in the size of net/; my concern
> is more about the set of permissions granted by the default kernel
> configuration.  Both from an OS developer standpoint, and also just so
> I feel I can continue to explain to someone who's learning about Linux
> all the interactions between the uid/gid, capabilities, SELinux, etc.
> If the kernel starts adding extensions to things it historically
> didn't, that gets even more complicated.

Oh, this is a very minor change from this point of view.  There was
already a related change in 2.3.x that permitted Olaf Kirch's
implementation of traceroute to work as non-root on Linux 2.4+ (ability
to receive ICMP errors).  We only want the extra bits of functionality
needed for ping available as well.

> > We figured that there's little point behind such restrictions.  Just how
> > is an ICMP echo request any worse than a UDP packet of the same size?
> > Anyone can send the latter with current kernels.
> 
> Clearly if we could go back in time and make some changes to the
> default Unix security model, one of those would probably be making
> allocating TCP/UDP sockets a privileged operation in some way.  But
> the ship has sailed there...

I see no problem here.  It is OK to have this available to all users by
default, but then to allow, say, a privsep child process in a daemon to
give up some of the "privileges" of a traditional Unix process -
including socket allocation.

> > Yet, as I have mentioned, we're in fact going to restrict this to a
> > group by default and to have ping SGID - just not to expose the extra
> > kernel code for direct attack by a local user.  That's in case there's a
> > vulnerability in the added code.
> 
> So wait...this whole patch is to remove the setuid bit, but then
> you're going to go back and add setgid?  How is that really
> compellingly different and better?

ping would otherwise be either available to root only or SUID root.
If there are no other SUID root programs - like there are not in a
default install of Owl 3.0 (our passwd, crontab, at work without SUID
due to changes we made elsewhere) - then it means that ping's SUIDness
would expose the program startup code in ld/libc/kernel to potential
attacks.  There were numerous relevant vulnerabilities in the past.

SGID would not actually be a problem.  On the contrary, it would
introduce a second layer of security - protecting the kernel code
related to ICMP sockets from direct attacks.  If there's a vulnerability
in ld/libc/ping that results in a compromise of the group, this merely
removes this second layer of security.  It does not result in a
compromise of the system on its own.

You might want to refer to the following pages for info on how our
SUID-less userland is possible (without fscaps):

http://www.openwall.com/Owl/
http://www.openwall.com/tcb/
http://www.openwall.com/presentations/Owl/mgp00013.html
http://www.openwall.com/presentations/Owl/mgp00020.html
http://www.openwall.com/presentations/Owl/mgp00021.html
http://www.openwall.com/presentations/Owl/mgp00022.html
http://www.openwall.com/presentations/Owl/mgp00023.html

ping is the only problem we have not solved yet.  On Owl 3.0, we install
it mode 700 by default for this reason, and the "control ping public"
command makes it mode 4711 (re-introducing the risk).  Thus, this ICMP
sockets feature is fairly important for us.  It's the last missing bit.

ping also appears to be the only program that would benefit from fscaps
for real.  (In other cases, different solutions are possible and/or the
capability set would be root-equivalent.)  Rather than have the system
depend on fscaps, which is inconvenient (e.g., backup/restore issues),
we can address the specific case of ping.

Alexander

^ permalink raw reply

* Re: request-pull: Use static const
From: David Miller @ 2010-12-21 21:27 UTC (permalink / raw)
  To: joe; +Cc: netdev
In-Reply-To: <1292927173.1779.27.camel@Joe-Laptop>

From: Joe Perches <joe@perches.com>
Date: Tue, 21 Dec 2010 02:26:13 -0800

> Resubmitting the remnants of an earlier discrete patchset.
> https://lkml.org/lkml/2010/11/20/209
> Consolidated into smaller patches.
> 
> The following changes since commit d9993be65a77f500ae926176baa264816bfe3816:
> 
>   Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 (2010-12-20 13:24:14 -0800)
> 
> are available in the git repository at:
> 
>   git://repo.or.cz/linux-2.6/trivial-mods.git 20101221_static_const
> 
> Joe Perches (5):
>       tg3: Use DEFINE_PCI_DEVICE_TABLE
>       drivers/net/*.c: Use static const
>       usb: Use static const, consolidate code
>       tulip: Use DEFINE_PCI_DEVICE_TABLE and static const
>       drivers/net/*/: Use static const

Pulled, thanks Joe.

^ permalink raw reply

* Re: [PATCH 2/2 -next] sundance: Program station address into HW
From: David Miller @ 2010-12-21 21:25 UTC (permalink / raw)
  To: dkirjanov; +Cc: netdev
In-Reply-To: <20101221120226.GA18412@hera.kernel.org>

From: Denis Kirjanov <dkirjanov@kernel.org>
Date: Tue, 21 Dec 2010 12:02:26 +0000

> Program adapter's StationAddress register when changing device MAC address
> 
> Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org>

Applied.

^ permalink raw reply

* Re: [PATCH 1/2 -next] sundance: Wrap up acceess to ASICCtrl high word with a macro
From: David Miller @ 2010-12-21 21:25 UTC (permalink / raw)
  To: dkirjanov; +Cc: netdev
In-Reply-To: <20101221120136.GA16612@hera.kernel.org>

From: Denis Kirjanov <dkirjanov@kernel.org>
Date: Tue, 21 Dec 2010 12:01:36 +0000

> Wrap up acceess to ASICCtrl high word with a macro
> 
> Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org>

Applied.

^ permalink raw reply

* Re: [PATCH V7 1/8] ntp: add ADJ_SETOFFSET mode bit
From: Kuwahara,T. @ 2010-12-21 21:13 UTC (permalink / raw)
  To: john stultz
  Cc: Richard Cochran, linux-kernel, linux-api, netdev, Alan Cox,
	Arnd Bergmann, Christoph Lameter, David Miller, Krzysztof Halasa,
	Peter Zijlstra, Rodolfo Giometti, Thomas Gleixner
In-Reply-To: <1292960224.2618.4.camel@work-vm>

On Wed, Dec 22, 2010 at 4:37 AM, john stultz <johnstul@us.ibm.com> wrote:
> adjtimex is a linux specific interface, which is compatible but not
> identical to the ntp specified interfaces. The ntp client code already
> has Linux specific modifications, so I don't think we have to worry
> about 0x40 specifically being reserved by the NTP client.

But struct timex is not linux-specific...

^ permalink raw reply

* Re: Buglet in net/pkt_cls.h pointer handling.
From: David Miller @ 2010-12-21 20:57 UTC (permalink / raw)
  To: suckfish; +Cc: netdev
In-Reply-To: <20101216215627.43f34977.suckfish@ihug.co.nz>

From: Ralph Loader <suckfish@ihug.co.nz>
Date: Thu, 16 Dec 2010 21:56:27 +1300

> tcf_valid_offset() in net/pkt_cls.h appears to have a couple of 
> problems (obvious patch below):
> 
> (a) there is no check for overflow in the pointer arithmetic.
> (b) the pointers are presumably likely to be normally valid, so the
>     hint should be 'likely()' not 'unlikely()'.
> 
> The offsets used to construct the arguments to that function, e.g., as
> called in net/sched/em_u32.c, I think come from user-space & in theory
> could be crafted to cause an invalid pointer deref if ptr+len overflows?
> 
> Possibly the '<' and '>' in that function should be '<=' and '>='
> also.  I'm not familiar enough with the data-structures to be sure.
> 
> Also a question:  in em_u32.c em_u32_match(), and in cls_u32.c
> u32_classify(), we dereference pointers that have had an offset
> (originally from user space) added to them.  I can't see anything that
> keeps those pointers aligned.  Is that a problem on architectures that
> don't support unaligned pointers, or am I missing something?

Your analysis is accurate, so I added the <= and >= test changes
and applied the following to the tree.

Please read Documentation/SubmittingPatches and
Documentation/email-clients.txt before submitting your
own patches in the future.

What you sent here was whitespace damaged by your email client
and you didn't provide a proper "Signed-off-by: " tag in your
commit message.

Thanks.

--------------------
net: Fix range checks in tcf_valid_offset().

This function has three bugs:

1) The offset should be valid most of the time, this is just
   a sanity check, therefore we should use "likely" not "unlikely"

2) This is the only place where we can check for arithmetic overflow
   of the pointer plus the length.

3) The existing range checks are off by one, the valid range is
   skb->head to skb_tail_pointer(), inclusive.

Based almost entirely upon a patch by Ralph Loader.

Reported-by: Ralph Loader <suckfish@ihug.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/pkt_cls.h |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index dd3031a..9fcc680 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -323,7 +323,9 @@ static inline unsigned char * tcf_get_base_ptr(struct sk_buff *skb, int layer)
 static inline int tcf_valid_offset(const struct sk_buff *skb,
 				   const unsigned char *ptr, const int len)
 {
-	return unlikely((ptr + len) < skb_tail_pointer(skb) && ptr > skb->head);
+	return likely((ptr + len) <= skb_tail_pointer(skb) &&
+		      ptr >= skb->head &&
+		      (ptr <= (ptr + len)));
 }
 
 #ifdef CONFIG_NET_CLS_IND
-- 
1.7.3.4


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox