Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH repost] sched: export sched_set/getaffinity to modules
From: Oleg Nesterov @ 2010-07-01 14:46 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Michael S. Tsirkin, Peter Zijlstra, Ingo Molnar,
	Sridhar Samudrala, netdev, lkml, kvm@vger.kernel.org,
	Andrew Morton, Dmitri Vorobiev, Jiri Kosina, Thomas Gleixner,
	Andi Kleen
In-Reply-To: <4C2CA5C5.4040402@kernel.org>

On 07/01, Tejun Heo wrote:
>
> All that's necessary is shortcutting indirection through kthreadd.
> ie. An exported function which looks like the following,
>
>  struct kthread_clone_or_whatever(int (*threadfn).....)
>  {
> 	struct kthread_create_info create;
> 	int pid;
>
> 	INIT create;
>
> 	pid = kernel_thread(kthread, &create, CLONE_FS...);
> 	if (pid < 0)
> 		return ERROR;
> 	wait_for_completion(&create.done);
>
> 	if (!IS_ERR(create.result))
> 		SET NAME;
> 	return create.result;
>  }
>
> It might be a good idea to make the function take extra clone flags
> but anyways once created cloned task can be treated the same way as
> other kthreads, so nothing else needs to be changed.

This makes kthread_stop() work. Otherwise the new thread is just
the CLONE_VM child of the caller, and the caller is the user-mode
task doing ioctl() ?

Oleg.


^ permalink raw reply

* Re: [PATCH repost] sched: export sched_set/getaffinity to modules
From: Oleg Nesterov @ 2010-07-01 14:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Michael S. Tsirkin, Ingo Molnar, Sridhar Samudrala, Tejun Heo,
	netdev, lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Andi Kleen
In-Reply-To: <1277991024.1917.108.camel@laptop>

On 07/01, Peter Zijlstra wrote:
>
> On Thu, 2010-07-01 at 16:08 +0300, Michael S. Tsirkin wrote:
> > Maybe it makes sense to add kthread_clone (in addition to
> > kthread_create) that would do what you suggest?
> > If yes, any hints on an implementation?
>
> I think that's called kernel_thread() see
> kernel/kthread.c:create_kthread().

Well, strictly speaking kernel_thread() doesn't create the kernel thread.
Unless the caller is the kernel thread. And daemonize() is deprecated.
kernel_thread() just forks the CLONE_VM + flags child.

Oleg.


^ permalink raw reply

* Re: [PATCH repost] sched: export sched_set/getaffinity to modules
From: Tejun Heo @ 2010-07-01 14:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Zijlstra, Ingo Molnar, Sridhar Samudrala, Oleg Nesterov,
	netdev, lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Andi Kleen
In-Reply-To: <20100701133956.GD32223@redhat.com>

Hello,

On 07/01/2010 03:39 PM, Michael S. Tsirkin wrote:
>> I think that's called kernel_thread() see
>> kernel/kthread.c:create_kthread().
>>
>> Doing the whole kthreadd dance and then copying bits and pieces back
>> sounds very fragile, so yeah, something like that should work.
> 
> Thanks!
> Sridhar, Tejun, have the time to look into this approach?

All that's necessary is shortcutting indirection through kthreadd.
ie. An exported function which looks like the following,

 struct kthread_clone_or_whatever(int (*threadfn).....)
 {
	struct kthread_create_info create;
	int pid;

	INIT create;

	pid = kernel_thread(kthread, &create, CLONE_FS...);
	if (pid < 0)
		return ERROR;
	wait_for_completion(&create.done);

	if (!IS_ERR(create.result))
		SET NAME;
	return create.result;
 }

It might be a good idea to make the function take extra clone flags
but anyways once created cloned task can be treated the same way as
other kthreads, so nothing else needs to be changed.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH repost] sched: export sched_set/getaffinity to modules
From: Peter Zijlstra @ 2010-07-01 13:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Ingo Molnar, Sridhar Samudrala, Tejun Heo, Oleg Nesterov, netdev,
	lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Andi Kleen
In-Reply-To: <20100701133956.GD32223@redhat.com>

On Thu, 2010-07-01 at 16:39 +0300, Michael S. Tsirkin wrote:
> 
> The proposed patch kills the thread when the fd is closed,
> so I think this already works without making it part of the process.
> 
OK, fd bound resources are fine.

^ permalink raw reply

* [PATCH net-next-2.6] be2net: changes to properly provide phy details
From: Ajit Khaparde @ 2010-07-01 13:51 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

be2net driver is currently not showing correct phy details in certain cases.
This patch fixes it.

Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>
---
 drivers/net/benet/be.h         |    1 +
 drivers/net/benet/be_cmds.c    |   35 +++++++++++++++++++++++++
 drivers/net/benet/be_cmds.h    |   27 +++++++++++++++++++
 drivers/net/benet/be_ethtool.c |   55 +++++++++++++++++++++++++++++----------
 4 files changed, 104 insertions(+), 14 deletions(-)

diff --git a/drivers/net/benet/be.h b/drivers/net/benet/be.h
index b46be49..1a0d2d0 100644
--- a/drivers/net/benet/be.h
+++ b/drivers/net/benet/be.h
@@ -282,6 +282,7 @@ struct be_adapter {
 	int link_speed;
 	u8 port_type;
 	u8 transceiver;
+	u8 autoneg;
 	u8 generation;		/* BladeEngine ASIC generation */
 	u32 flash_status;
 	struct completion flash_compl;
diff --git a/drivers/net/benet/be_cmds.c b/drivers/net/benet/be_cmds.c
index 65e3260..344e062 100644
--- a/drivers/net/benet/be_cmds.c
+++ b/drivers/net/benet/be_cmds.c
@@ -1695,3 +1695,38 @@ int be_cmd_get_seeprom_data(struct be_adapter *adapter,
 	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
+
+int be_cmd_get_phy_info(struct be_adapter *adapter, struct be_dma_mem *cmd)
+{
+	struct be_mcc_wrb *wrb;
+	struct be_cmd_req_get_phy_info *req;
+	struct be_sge *sge;
+	int status;
+
+	spin_lock_bh(&adapter->mcc_lock);
+
+	wrb = wrb_from_mccq(adapter);
+	if (!wrb) {
+		status = -EBUSY;
+		goto err;
+	}
+
+	req = cmd->va;
+	sge = nonembedded_sgl(wrb);
+
+	be_wrb_hdr_prepare(wrb, sizeof(*req), false, 1,
+				OPCODE_COMMON_GET_PHY_DETAILS);
+
+	be_cmd_hdr_prepare(&req->hdr, CMD_SUBSYSTEM_COMMON,
+			OPCODE_COMMON_GET_PHY_DETAILS,
+			sizeof(*req));
+
+	sge->pa_hi = cpu_to_le32(upper_32_bits(cmd->dma));
+	sge->pa_lo = cpu_to_le32(cmd->dma & 0xFFFFFFFF);
+	sge->len = cpu_to_le32(cmd->size);
+
+	status = be_mcc_notify_wait(adapter);
+err:
+	spin_unlock_bh(&adapter->mcc_lock);
+	return status;
+}
diff --git a/drivers/net/benet/be_cmds.h b/drivers/net/benet/be_cmds.h
index 763dc19..912a058 100644
--- a/drivers/net/benet/be_cmds.h
+++ b/drivers/net/benet/be_cmds.h
@@ -144,6 +144,7 @@ struct be_mcc_mailbox {
 #define OPCODE_COMMON_ENABLE_DISABLE_BEACON		69
 #define OPCODE_COMMON_GET_BEACON_STATE			70
 #define OPCODE_COMMON_READ_TRANSRECV_DATA		73
+#define OPCODE_COMMON_GET_PHY_DETAILS			102
 
 #define OPCODE_ETH_ACPI_CONFIG				2
 #define OPCODE_ETH_PROMISCUOUS				3
@@ -869,6 +870,30 @@ struct be_cmd_resp_seeprom_read {
 	u8 seeprom_data[BE_READ_SEEPROM_LEN];
 };
 
+enum {
+	PHY_TYPE_CX4_10GB = 0,
+	PHY_TYPE_XFP_10GB,
+	PHY_TYPE_SFP_1GB,
+	PHY_TYPE_SFP_PLUS_10GB,
+	PHY_TYPE_KR_10GB,
+	PHY_TYPE_KX4_10GB,
+	PHY_TYPE_BASET_10GB,
+	PHY_TYPE_BASET_1GB,
+	PHY_TYPE_DISABLED = 255
+};
+
+struct be_cmd_req_get_phy_info {
+	struct be_cmd_req_hdr hdr;
+	u8 rsvd0[24];
+};
+struct be_cmd_resp_get_phy_info {
+	struct be_cmd_req_hdr hdr;
+	u16 phy_type;
+	u16 interface_type;
+	u32 misc_params;
+	u32 future_use[4];
+};
+
 extern int be_pci_fnum_get(struct be_adapter *adapter);
 extern int be_cmd_POST(struct be_adapter *adapter);
 extern int be_cmd_mac_addr_query(struct be_adapter *adapter, u8 *mac_addr,
@@ -947,4 +972,6 @@ extern int be_cmd_get_seeprom_data(struct be_adapter *adapter,
 				struct be_dma_mem *nonemb_cmd);
 extern int be_cmd_set_loopback(struct be_adapter *adapter, u8 port_num,
 				u8 loopback_type, u8 enable);
+extern int be_cmd_get_phy_info(struct be_adapter *adapter,
+		struct be_dma_mem *cmd);
 
diff --git a/drivers/net/benet/be_ethtool.c b/drivers/net/benet/be_ethtool.c
index 200e985..c0ade24 100644
--- a/drivers/net/benet/be_ethtool.c
+++ b/drivers/net/benet/be_ethtool.c
@@ -314,10 +314,13 @@ static int be_get_sset_count(struct net_device *netdev, int stringset)
 static int be_get_settings(struct net_device *netdev, struct ethtool_cmd *ecmd)
 {
 	struct be_adapter *adapter = netdev_priv(netdev);
-	u8 mac_speed = 0, connector = 0;
+	struct be_dma_mem phy_cmd;
+	struct be_cmd_resp_get_phy_info *resp;
+	u8 mac_speed = 0;
 	u16 link_speed = 0;
 	bool link_up = false;
 	int status;
+	u16 intf_type;
 
 	if (adapter->link_speed < 0) {
 		status = be_cmd_link_status_query(adapter, &link_up,
@@ -337,40 +340,57 @@ static int be_get_settings(struct net_device *netdev, struct ethtool_cmd *ecmd)
 			}
 		}
 
-		status = be_cmd_read_port_type(adapter, adapter->port_num,
-						&connector);
+		phy_cmd.size = sizeof(struct be_cmd_req_get_phy_info);
+		phy_cmd.va = pci_alloc_consistent(adapter->pdev, phy_cmd.size,
+					&phy_cmd.dma);
+		if (!phy_cmd.va) {
+			dev_err(&adapter->pdev->dev, "Memory alloc failure\n");
+			return -ENOMEM;
+		}
+		status = be_cmd_get_phy_info(adapter, &phy_cmd);
 		if (!status) {
-			switch (connector) {
-			case 7:
+			resp = (struct be_cmd_resp_get_phy_info *) phy_cmd.va;
+			intf_type = le16_to_cpu(resp->interface_type);
+
+			switch (intf_type) {
+			case PHY_TYPE_XFP_10GB:
+			case PHY_TYPE_SFP_1GB:
+			case PHY_TYPE_SFP_PLUS_10GB:
 				ecmd->port = PORT_FIBRE;
-				ecmd->transceiver = XCVR_EXTERNAL;
-				break;
-			case 0:
-				ecmd->port = PORT_TP;
-				ecmd->transceiver = XCVR_EXTERNAL;
 				break;
 			default:
 				ecmd->port = PORT_TP;
-				ecmd->transceiver = XCVR_INTERNAL;
 				break;
 			}
-		} else {
-			ecmd->port = PORT_AUI;
+
+			switch (intf_type) {
+			case PHY_TYPE_KR_10GB:
+			case PHY_TYPE_KX4_10GB:
+				ecmd->autoneg = AUTONEG_ENABLE;
 			ecmd->transceiver = XCVR_INTERNAL;
+				break;
+			default:
+				ecmd->autoneg = AUTONEG_DISABLE;
+				ecmd->transceiver = XCVR_EXTERNAL;
+				break;
+			}
 		}
 
 		/* Save for future use */
 		adapter->link_speed = ecmd->speed;
 		adapter->port_type = ecmd->port;
 		adapter->transceiver = ecmd->transceiver;
+		adapter->autoneg = ecmd->autoneg;
+		pci_free_consistent(adapter->pdev, phy_cmd.size,
+					phy_cmd.va, phy_cmd.dma);
 	} else {
 		ecmd->speed = adapter->link_speed;
 		ecmd->port = adapter->port_type;
 		ecmd->transceiver = adapter->transceiver;
+		ecmd->autoneg = adapter->autoneg;
 	}
 
 	ecmd->duplex = DUPLEX_FULL;
-	ecmd->autoneg = AUTONEG_DISABLE;
 	ecmd->phy_address = adapter->port_num;
 	switch (ecmd->port) {
 	case PORT_FIBRE:
@@ -384,6 +404,13 @@ static int be_get_settings(struct net_device *netdev, struct ethtool_cmd *ecmd)
 		break;
 	}
 
+	if (ecmd->autoneg) {
+		ecmd->supported |= SUPPORTED_1000baseT_Full;
+		ecmd->supported |= SUPPORTED_Autoneg;
+		ecmd->advertising |= (ADVERTISED_10000baseT_Full |
+				ADVERTISED_1000baseT_Full);
+	}
+
 	return 0;
 }
 
-- 
1.7.0.4


^ permalink raw reply related

* Re: [PATCH repost] sched: export sched_set/getaffinity to modules
From: Michael S. Tsirkin @ 2010-07-01 13:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Sridhar Samudrala, Tejun Heo, Oleg Nesterov, netdev,
	lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Andi Kleen
In-Reply-To: <1277991024.1917.108.camel@laptop>

On Thu, Jul 01, 2010 at 03:30:24PM +0200, Peter Zijlstra wrote:
> On Thu, 2010-07-01 at 16:08 +0300, Michael S. Tsirkin wrote:
> > On Thu, Jul 01, 2010 at 02:46:35PM +0200, Peter Zijlstra wrote:
> > > On Thu, 2010-07-01 at 14:34 +0200, Peter Zijlstra wrote:
> > > > On Thu, 2010-07-01 at 15:23 +0300, Michael S. Tsirkin wrote:
> > > > > 
> > > > > The patch using this is here:
> > > > > http://www.mail-archive.com/kvm@vger.kernel.org/msg35411.html
> > > > > 
> > > > > It simply copies the affinity from the parent when thread is created.
> > > > 
> > > > Sounds like policy, not something the kernel should do..
> > > 
> > > The alternative would be using clone() instead of thread_create() and
> > > inherit everything from the creating task.
> > > Inheriting from kthreadd and then undoing some aspects just sounds
> > > like daft policy that really ought to be in userspace.
> > 
> > Yes, that's basically what this patchset is trying to do:
> > create a workqueue inheriting everything from the creating task.
> > Sridhar started with an API to do exactly this:
> > http://linux.derkeiler.com/Mailing-Lists/Kernel/2010-05/msg07478.html
> > 
> > Then we switched to raw kthread to avoid stepping on cwq toes.
> > Maybe it makes sense to add kthread_clone (in addition to
> > kthread_create) that would do what you suggest?
> > If yes, any hints on an implementation?
> 
> I think that's called kernel_thread() see
> kernel/kthread.c:create_kthread().
> 
> Doing the whole kthreadd dance and then copying bits and pieces back
> sounds very fragile, so yeah, something like that should work.


Thanks!
Sridhar, Tejun, have the time to look into this approach?

> The other issue to consider is the thread group status of these things,
> I think it would be best if these threads were still considered part of
> the process that spawned them so that they would die nicely when the
> process gets whacked.

The proposed patch kills the thread when the fd is closed,
so I think this already works without making it part of the process.

> At which point one could wonder if the kthread interface makes any
> sense, why not let userspace fork tasks and let them call into the
> kernel to perform work...

One thing I wanted to avoid is letting userspace know
just how many threads are there. We are using a single one
now, but we used to have threads per-cpu, and we might
switch to a thread per virtqueue in the future.
IMO all this should ideally be transparent to userspace.

-- 
MST

^ permalink raw reply

* ::::::APPLY FOR A LOAN:::::
From: Private Fund Loan Inc @ 2010-07-01 13:42 UTC (permalink / raw)
  To: inq

We offer loan at cheap rate, they say opprtunity does not knock twice.
But with our loan firm, you can get the cheapest loan at a low
percentage rate of 2% yearly from $5,000.00 Min. To $100 000 000.00 Max.

Contact us via email if you are interested with the following details
NAME: 
PHONE: 
LOAN DURATION:
ADDRESS: 
AMOUNT:

Private Fund Loan Inc
+60102950738


^ permalink raw reply

* Re: [PATCH repost] sched: export sched_set/getaffinity to modules
From: Peter Zijlstra @ 2010-07-01 13:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Ingo Molnar, Sridhar Samudrala, Tejun Heo, Oleg Nesterov, netdev,
	lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Andi Kleen
In-Reply-To: <20100701130816.GB32223@redhat.com>

On Thu, 2010-07-01 at 16:08 +0300, Michael S. Tsirkin wrote:
> On Thu, Jul 01, 2010 at 02:46:35PM +0200, Peter Zijlstra wrote:
> > On Thu, 2010-07-01 at 14:34 +0200, Peter Zijlstra wrote:
> > > On Thu, 2010-07-01 at 15:23 +0300, Michael S. Tsirkin wrote:
> > > > 
> > > > The patch using this is here:
> > > > http://www.mail-archive.com/kvm@vger.kernel.org/msg35411.html
> > > > 
> > > > It simply copies the affinity from the parent when thread is created.
> > > 
> > > Sounds like policy, not something the kernel should do..
> > 
> > The alternative would be using clone() instead of thread_create() and
> > inherit everything from the creating task.
> > Inheriting from kthreadd and then undoing some aspects just sounds
> > like daft policy that really ought to be in userspace.
> 
> Yes, that's basically what this patchset is trying to do:
> create a workqueue inheriting everything from the creating task.
> Sridhar started with an API to do exactly this:
> http://linux.derkeiler.com/Mailing-Lists/Kernel/2010-05/msg07478.html
> 
> Then we switched to raw kthread to avoid stepping on cwq toes.
> Maybe it makes sense to add kthread_clone (in addition to
> kthread_create) that would do what you suggest?
> If yes, any hints on an implementation?

I think that's called kernel_thread() see
kernel/kthread.c:create_kthread().

Doing the whole kthreadd dance and then copying bits and pieces back
sounds very fragile, so yeah, something like that should work.

The other issue to consider is the thread group status of these things,
I think it would be best if these threads were still considered part of
the process that spawned them so that they would die nicely when the
process gets whacked.

At which point one could wonder if the kthread interface makes any
sense, why not let userspace fork tasks and let them call into the
kernel to perform work...

^ permalink raw reply

* Re: [PATCH repost] sched: export sched_set/getaffinity to modules
From: Michael S. Tsirkin @ 2010-07-01 13:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Sridhar Samudrala, Tejun Heo, Oleg Nesterov, netdev,
	lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Andi Kleen
In-Reply-To: <1277989646.1917.77.camel@laptop>

On Thu, Jul 01, 2010 at 03:07:26PM +0200, Peter Zijlstra wrote:
> On Thu, 2010-07-01 at 15:50 +0300, Michael S. Tsirkin wrote:
> > On Thu, Jul 01, 2010 at 02:32:43PM +0200, Peter Zijlstra wrote:
> > > On Thu, 2010-07-01 at 14:55 +0300, Michael S. Tsirkin wrote:
> > > 
> > > > >  - why can't it set the kernel thread's affinity too?
> > > > 
> > > > It can. However: the threads are started internally by the driver
> > > > when qemu does an ioctl.  What we want to do is give it a sensible
> > > > default affinity. management tool can later tweak it if it wants to.
> > > 
> > > So have that ioctl return the tid of that new fancy thread and then set
> > > its affinity, stuff it in cgroup, whatever you fancy.
> > > 
> > > > >  - what happens if someone changes the tasks' affinity?
> > > > 
> > > > We would normally create a cgroup including all internal
> > > > tasks, making it easy to find and change affinity for
> > > > them all if necessary. 
> > > 
> > > And to stuff them in a cgroup you also need the tid, at which point it
> > > might as well set the affinity from userspace, right?
> > 
> > We also put it in a cgroup transparently. I think that it's actually
> > important to do it on thread creation: if we don't, malicious userspace
> > can create large amount of work exceeding the cgroup limits.
> > 
> > And the same applies so the affinity, right? If the qemu process
> > is limited to a set of CPUs, isn't it important to make
> > the kernel thread that does work our behalf limited to the same
> > set of CPUs?
> 
> I'm not sure we have anything like this, but I wouldn't think so, if a
> driver creates a kthread and manages to inject tons of work its not
> typically limited to whatever limits apply to the task that supplied the
> work.
> 
> Take the encryption threads for example, those don't run in the context
> of whoever provides the data to be encrypted (file,net whatever) and
> thus the task responsible could consume heaps more resources than when
> it would have to do the encryption itself.
> 
> That's how kthreads work.

Right. And IMHO ideally all such work would run on the appropriate
CPU and be accounted to. It's just that with virt people seem to
run untrusted applications and expect the damage to be contained.
So we came up with a simple approach that seems to do the
just just for us.

-- 
MST

^ permalink raw reply

* Re: [PATCH repost] sched: export sched_set/getaffinity to modules
From: Michael S. Tsirkin @ 2010-07-01 13:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Sridhar Samudrala, Tejun Heo, Oleg Nesterov, netdev,
	lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Andi Kleen
In-Reply-To: <1277988395.1917.47.camel@laptop>

On Thu, Jul 01, 2010 at 02:46:35PM +0200, Peter Zijlstra wrote:
> On Thu, 2010-07-01 at 14:34 +0200, Peter Zijlstra wrote:
> > On Thu, 2010-07-01 at 15:23 +0300, Michael S. Tsirkin wrote:
> > > 
> > > The patch using this is here:
> > > http://www.mail-archive.com/kvm@vger.kernel.org/msg35411.html
> > > 
> > > It simply copies the affinity from the parent when thread is created.
> > 
> > Sounds like policy, not something the kernel should do..
> 
> The alternative would be using clone() instead of thread_create() and
> inherit everything from the creating task.
> Inheriting from kthreadd and then undoing some aspects just sounds
> like daft policy that really ought to be in userspace.

Yes, that's basically what this patchset is trying to do:
create a workqueue inheriting everything from the creating task.
Sridhar started with an API to do exactly this:
http://linux.derkeiler.com/Mailing-Lists/Kernel/2010-05/msg07478.html

Then we switched to raw kthread to avoid stepping on cwq toes.
Maybe it makes sense to add kthread_clone (in addition to
kthread_create) that would do what you suggest?
If yes, any hints on an implementation?


-- 
MST

^ permalink raw reply

* Re: [PATCH repost] sched: export sched_set/getaffinity to modules
From: Peter Zijlstra @ 2010-07-01 13:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Ingo Molnar, Sridhar Samudrala, Tejun Heo, Oleg Nesterov, netdev,
	lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Andi Kleen
In-Reply-To: <20100701125038.GA32223@redhat.com>

On Thu, 2010-07-01 at 15:50 +0300, Michael S. Tsirkin wrote:
> On Thu, Jul 01, 2010 at 02:32:43PM +0200, Peter Zijlstra wrote:
> > On Thu, 2010-07-01 at 14:55 +0300, Michael S. Tsirkin wrote:
> > 
> > > >  - why can't it set the kernel thread's affinity too?
> > > 
> > > It can. However: the threads are started internally by the driver
> > > when qemu does an ioctl.  What we want to do is give it a sensible
> > > default affinity. management tool can later tweak it if it wants to.
> > 
> > So have that ioctl return the tid of that new fancy thread and then set
> > its affinity, stuff it in cgroup, whatever you fancy.
> > 
> > > >  - what happens if someone changes the tasks' affinity?
> > > 
> > > We would normally create a cgroup including all internal
> > > tasks, making it easy to find and change affinity for
> > > them all if necessary. 
> > 
> > And to stuff them in a cgroup you also need the tid, at which point it
> > might as well set the affinity from userspace, right?
> 
> We also put it in a cgroup transparently. I think that it's actually
> important to do it on thread creation: if we don't, malicious userspace
> can create large amount of work exceeding the cgroup limits.
> 
> And the same applies so the affinity, right? If the qemu process
> is limited to a set of CPUs, isn't it important to make
> the kernel thread that does work our behalf limited to the same
> set of CPUs?

I'm not sure we have anything like this, but I wouldn't think so, if a
driver creates a kthread and manages to inject tons of work its not
typically limited to whatever limits apply to the task that supplied the
work.

Take the encryption threads for example, those don't run in the context
of whoever provides the data to be encrypted (file,net whatever) and
thus the task responsible could consume heaps more resources than when
it would have to do the encryption itself.

That's how kthreads work.

^ permalink raw reply

* [PATCH 2/2] qlge: fix a eeh handler to not add a pending timer
From: leitao @ 2010-07-01 13:00 UTC (permalink / raw)
  To: davem; +Cc: netdev, Breno Leitao, Ron Mercer
In-Reply-To: <e687077281d05d3a2da49431b7c0ff0b1076f3e6.1277936929.git.root@sanx1002.austin.ibm.com>

On some ocasions the function qlge_io_resume() tries to add a
pending timer, which causes the system to hit the BUG() on
add_timer() function.

This patch removes the timer during the EEH recovery.

Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com>
Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
---
 drivers/net/qlge/qlge_main.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/qlge/qlge_main.c b/drivers/net/qlge/qlge_main.c
index 509dadc..d10bcef 100644
--- a/drivers/net/qlge/qlge_main.c
+++ b/drivers/net/qlge/qlge_main.c
@@ -4712,6 +4712,8 @@ static void ql_eeh_close(struct net_device *ndev)
 		netif_stop_queue(ndev);
 	}
 
+	/* Disabling the timer */
+	del_timer_sync(&qdev->timer);
 	if (test_bit(QL_ADAPTER_UP, &qdev->flags))
 		cancel_delayed_work_sync(&qdev->asic_reset_work);
 	cancel_delayed_work_sync(&qdev->mpi_reset_work);
-- 
1.6.5.2


^ permalink raw reply related

* [PATCH 1/2] qlge: Replacing add_timer() to mod_timer()
From: leitao @ 2010-07-01 13:00 UTC (permalink / raw)
  To: davem; +Cc: netdev, Breno Leitao, Ron Mercer

Currently qlge driver calls add_timer() instead of mod_timer().
This patch changes add_timer() to mod_timer(), which seems a better
solution.

Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com>
Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
---
 drivers/net/qlge/qlge_main.c |    9 +++------
 1 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/net/qlge/qlge_main.c b/drivers/net/qlge/qlge_main.c
index fa4b24c..509dadc 100644
--- a/drivers/net/qlge/qlge_main.c
+++ b/drivers/net/qlge/qlge_main.c
@@ -4611,8 +4611,7 @@ static void ql_timer(unsigned long data)
 		return;
 	}
 
-	qdev->timer.expires = jiffies + (5*HZ);
-	add_timer(&qdev->timer);
+	mod_timer(&qdev->timer, jiffies + (5*HZ));
 }
 
 static int __devinit qlge_probe(struct pci_dev *pdev,
@@ -4808,8 +4807,7 @@ static void qlge_io_resume(struct pci_dev *pdev)
 		netif_err(qdev, ifup, qdev->ndev,
 			  "Device was not running prior to EEH.\n");
 	}
-	qdev->timer.expires = jiffies + (5*HZ);
-	add_timer(&qdev->timer);
+	mod_timer(&qdev->timer, jiffies + (5*HZ));
 	netif_device_attach(ndev);
 }
 
@@ -4871,8 +4869,7 @@ static int qlge_resume(struct pci_dev *pdev)
 			return err;
 	}
 
-	qdev->timer.expires = jiffies + (5*HZ);
-	add_timer(&qdev->timer);
+	mod_timer(&qdev->timer, jiffies + (5*HZ));
 	netif_device_attach(ndev);
 
 	return 0;
-- 
1.6.5.2


^ permalink raw reply related

* Re: [PATCH repost] sched: export sched_set/getaffinity to modules
From: Michael S. Tsirkin @ 2010-07-01 12:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Sridhar Samudrala, Tejun Heo, Oleg Nesterov, netdev,
	lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Andi Kleen
In-Reply-To: <1277987563.1917.28.camel@laptop>

On Thu, Jul 01, 2010 at 02:32:43PM +0200, Peter Zijlstra wrote:
> On Thu, 2010-07-01 at 14:55 +0300, Michael S. Tsirkin wrote:
> 
> > >  - why can't it set the kernel thread's affinity too?
> > 
> > It can. However: the threads are started internally by the driver
> > when qemu does an ioctl.  What we want to do is give it a sensible
> > default affinity. management tool can later tweak it if it wants to.
> 
> So have that ioctl return the tid of that new fancy thread and then set
> its affinity, stuff it in cgroup, whatever you fancy.
> 
> > >  - what happens if someone changes the tasks' affinity?
> > 
> > We would normally create a cgroup including all internal
> > tasks, making it easy to find and change affinity for
> > them all if necessary. 
> 
> And to stuff them in a cgroup you also need the tid, at which point it
> might as well set the affinity from userspace, right?

We also put it in a cgroup transparently. I think that it's actually
important to do it on thread creation: if we don't, malicious userspace
can create large amount of work exceeding the cgroup limits.

And the same applies so the affinity, right? If the qemu process
is limited to a set of CPUs, isn't it important to make
the kernel thread that does work our behalf limited to the same
set of CPUs?

-- 
MST

^ permalink raw reply

* Re: [PATCH repost] sched: export sched_set/getaffinity to modules
From: Peter Zijlstra @ 2010-07-01 12:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Ingo Molnar, Sridhar Samudrala, Tejun Heo, Oleg Nesterov, netdev,
	lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Andi Kleen
In-Reply-To: <1277987657.1917.32.camel@laptop>

On Thu, 2010-07-01 at 14:34 +0200, Peter Zijlstra wrote:
> On Thu, 2010-07-01 at 15:23 +0300, Michael S. Tsirkin wrote:
> > 
> > The patch using this is here:
> > http://www.mail-archive.com/kvm@vger.kernel.org/msg35411.html
> > 
> > It simply copies the affinity from the parent when thread is created.
> 
> Sounds like policy, not something the kernel should do..

The alternative would be using clone() instead of thread_create() and
inherit everything from the creating task. Inheriting from kthreadd and
then undoing some aspects just sounds like daft policy that really ought
to be in userspace.

^ permalink raw reply

* Re: [PATCH repost] sched: export sched_set/getaffinity to modules
From: Peter Zijlstra @ 2010-07-01 12:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Ingo Molnar, Sridhar Samudrala, Tejun Heo, Oleg Nesterov, netdev,
	lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Andi Kleen
In-Reply-To: <20100701122340.GB31333@redhat.com>

On Thu, 2010-07-01 at 15:23 +0300, Michael S. Tsirkin wrote:
> 
> The patch using this is here:
> http://www.mail-archive.com/kvm@vger.kernel.org/msg35411.html
> 
> It simply copies the affinity from the parent when thread is created.

Sounds like policy, not something the kernel should do..

^ permalink raw reply

* Re: [PATCH repost] sched: export sched_set/getaffinity to modules
From: Peter Zijlstra @ 2010-07-01 12:32 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Ingo Molnar, Sridhar Samudrala, Tejun Heo, Oleg Nesterov, netdev,
	lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Andi Kleen
In-Reply-To: <20100701115507.GA31333@redhat.com>

On Thu, 2010-07-01 at 14:55 +0300, Michael S. Tsirkin wrote:

> >  - why can't it set the kernel thread's affinity too?
> 
> It can. However: the threads are started internally by the driver
> when qemu does an ioctl.  What we want to do is give it a sensible
> default affinity. management tool can later tweak it if it wants to.

So have that ioctl return the tid of that new fancy thread and then set
its affinity, stuff it in cgroup, whatever you fancy.

> >  - what happens if someone changes the tasks' affinity?
> 
> We would normally create a cgroup including all internal
> tasks, making it easy to find and change affinity for
> them all if necessary. 

And to stuff them in a cgroup you also need the tid, at which point it
might as well set the affinity from userspace, right?

^ permalink raw reply

* Re: [PATCH repost] sched: export sched_set/getaffinity to modules
From: Michael S. Tsirkin @ 2010-07-01 12:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Sridhar Samudrala, Tejun Heo, Oleg Nesterov, netdev,
	lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Andi Kleen
In-Reply-To: <20100701115507.GA31333@redhat.com>

On Thu, Jul 01, 2010 at 02:55:07PM +0300, Michael S. Tsirkin wrote:
> On Thu, Jul 01, 2010 at 01:43:23PM +0200, Peter Zijlstra wrote:
> > On Thu, 2010-07-01 at 13:19 +0200, Peter Zijlstra wrote:
> > > On Thu, 2010-07-01 at 14:07 +0300, Michael S. Tsirkin wrote:
> > > > Author: Sridhar Samudrala <sri@us.ibm.com>
> > > > 
> > > > sched: export sched_set/getaffinity to modules
> > > > 
> > > > vhost-net driver wants to copy the affinity from the
> > > > owner thread to thread it creates. Export
> > > > sched_set/get affinity to modules to make this possible
> > > > when vhost is built as a module.
> > 
> > > Urgh,.. so why again is that a good idea?
> > 
> > In particular:
> >  - who sets the affinity of the task? 
> 
> management tools do this when they start qemu.
> 
> >  - why can't it set the kernel thread's affinity too?
> 
> It can. However: the threads are started internally by the driver
> when qemu does an ioctl.  What we want to do is give it a sensible
> default affinity. management tool can later tweak it if it wants to.
> 
> >  - what happens if someone changes the tasks' affinity?
> 
> We would normally create a cgroup including all internal
> tasks, making it easy to find and change affinity for
> them all if necessary.
> 
> > So no, I don't think this is a sensible thing to do at all.

The patch using this is here:
http://www.mail-archive.com/kvm@vger.kernel.org/msg35411.html

It simply copies the affinity from the parent when thread is created.

-- 
MST

^ permalink raw reply

* Re: [PATCH repost] sched: export sched_set/getaffinity to modules
From: Michael S. Tsirkin @ 2010-07-01 11:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Sridhar Samudrala, Tejun Heo, Oleg Nesterov, netdev,
	lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Andi Kleen
In-Reply-To: <1277984603.1917.15.camel@laptop>

On Thu, Jul 01, 2010 at 01:43:23PM +0200, Peter Zijlstra wrote:
> On Thu, 2010-07-01 at 13:19 +0200, Peter Zijlstra wrote:
> > On Thu, 2010-07-01 at 14:07 +0300, Michael S. Tsirkin wrote:
> > > Author: Sridhar Samudrala <sri@us.ibm.com>
> > > 
> > > sched: export sched_set/getaffinity to modules
> > > 
> > > vhost-net driver wants to copy the affinity from the
> > > owner thread to thread it creates. Export
> > > sched_set/get affinity to modules to make this possible
> > > when vhost is built as a module.
> 
> > Urgh,.. so why again is that a good idea?
> 
> In particular:
>  - who sets the affinity of the task? 

management tools do this when they start qemu.

>  - why can't it set the kernel thread's affinity too?

It can. However: the threads are started internally by the driver
when qemu does an ioctl.  What we want to do is give it a sensible
default affinity. management tool can later tweak it if it wants to.

>  - what happens if someone changes the tasks' affinity?

We would normally create a cgroup including all internal
tasks, making it easy to find and change affinity for
them all if necessary.

> So no, I don't think this is a sensible thing to do at all.

^ permalink raw reply

* Re: [PATCH net-next-2.6] ipv4: sysctl to block responding on down interface
From: Joakim Tjernlund @ 2010-07-01 11:48 UTC (permalink / raw)
  To: Andi Kleen; +Cc: David Miller, netdev, Stephen Hemminger
In-Reply-To: <871vbn1m52.fsf@basil.nowhere.org>

Andi Kleen <andi@firstfloor.org> wrote on 2010/07/01 13:23:21:
>
> Joakim Tjernlund <joakim.tjernlund@transmode.se> writes:
>
> > Stephen Hemminger <shemminger@vyatta.com> wrote on 2010/06/11 17:48:54:
> >>
> >> When Linux is used as a router, it is undesirable for the kernel to process
> >> incoming packets when the address assigned to the interface is down.
> >> The initial problem report was for a management application that used ICMP
> >> to check link availability.
> >>
> >> The default is disabled to maintain compatibility with previous behavior.
> >> This is not recommended for server systems because it makes fail over more
> >> difficult, and does not account for configurations where multiple interfaces
> >> have the same IP address.
> >>
> >> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> >
> > Ping David et. all?
> > I too want this.
>
> Doesn't arpfilter enable this already? If you set in on the still up
> interfaces those will not answer to other IP addresses.
>
> This only works on the ARP level, so it has to wait until the arp
> cache in the remote host times out.

I tried that but it didn't work, but I didn't think of clearing
the ARP cache.
Anyhow, such methods seems worse than just doing ifconfig 0.0.0.0

  Jocke


^ permalink raw reply

* Re: [PATCH repost] sched: export sched_set/getaffinity to modules
From: Peter Zijlstra @ 2010-07-01 11:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Ingo Molnar, Sridhar Samudrala, Tejun Heo, Oleg Nesterov, netdev,
	lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Andi Kleen
In-Reply-To: <1277983179.1917.10.camel@laptop>

On Thu, 2010-07-01 at 13:19 +0200, Peter Zijlstra wrote:
> On Thu, 2010-07-01 at 14:07 +0300, Michael S. Tsirkin wrote:
> > Author: Sridhar Samudrala <sri@us.ibm.com>
> > 
> > sched: export sched_set/getaffinity to modules
> > 
> > vhost-net driver wants to copy the affinity from the
> > owner thread to thread it creates. Export
> > sched_set/get affinity to modules to make this possible
> > when vhost is built as a module.

> Urgh,.. so why again is that a good idea?

In particular:
 - who sets the affinity of the task? 
 - why can't it set the kernel thread's affinity too?
 - what happens if someone changes the tasks' affinity?

So no, I don't think this is a sensible thing to do at all.

^ permalink raw reply

* Re: [net-next-2.6 PATCH v2] x86: Align skb w/ start of cacheline on newer core 2/Xeon Arch
From: Andi Kleen @ 2010-07-01 11:28 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: davem, netdev, gospo, bphilips, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Alexander Duyck
In-Reply-To: <20100630043728.9224.64191.stgit@localhost.localdomain>

Jeff Kirsher <jeffrey.t.kirsher@intel.com> writes:


Sorry for the late comment.

>  
> +#ifdef CONFIG_MCORE2
> +/*
> + * We handle most unaligned accesses in hardware.  On the other hand
> + * unaligned DMA can be quite expensive on some Nehalem processors.
> + *
> + * Based on this we disable the IP header alignment in network drivers.
> + */
> +#define NET_IP_ALIGN	0
> +#endif
>  #endif /* _ASM_X86_SYSTEM_H */

The ifdef should be imho dropped and the option be made unconditional
for all x86. I am not aware of any x86 core where unalignment is really
slow. This would increase the chance of it actually working on many
configurations which do not necessarily optimize for Core2.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply

* Re: [PATCH net-next-2.6] ipv4: sysctl to block responding on down interface
From: Andi Kleen @ 2010-07-01 11:23 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Stephen Hemminger, David Miller, netdev
In-Reply-To: <OF62B30BCC.F8523B86-ONC1257750.00687496-C1257750.0068AA27@transmode.se>

Joakim Tjernlund <joakim.tjernlund@transmode.se> writes:

> Stephen Hemminger <shemminger@vyatta.com> wrote on 2010/06/11 17:48:54:
>>
>> When Linux is used as a router, it is undesirable for the kernel to process
>> incoming packets when the address assigned to the interface is down.
>> The initial problem report was for a management application that used ICMP
>> to check link availability.
>>
>> The default is disabled to maintain compatibility with previous behavior.
>> This is not recommended for server systems because it makes fail over more
>> difficult, and does not account for configurations where multiple interfaces
>> have the same IP address.
>>
>> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>
> Ping David et. all?
> I too want this.

Doesn't arpfilter enable this already? If you set in on the still up
interfaces those will not answer to other IP addresses.

This only works on the ARP level, so it has to wait until the arp
cache in the remote host times out.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply

* Re: [PATCH repost] sched: export sched_set/getaffinity to modules
From: Peter Zijlstra @ 2010-07-01 11:19 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Ingo Molnar, Sridhar Samudrala, Tejun Heo, Oleg Nesterov, netdev,
	lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Andi Kleen
In-Reply-To: <20100701110708.GA27368@redhat.com>

On Thu, 2010-07-01 at 14:07 +0300, Michael S. Tsirkin wrote:
> Author: Sridhar Samudrala <sri@us.ibm.com>
> 
> sched: export sched_set/getaffinity to modules
> 
> vhost-net driver wants to copy the affinity from the
> owner thread to thread it creates. Export
> sched_set/get affinity to modules to make this possible
> when vhost is built as a module.
> 
> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> ---
> 
> I'm not sure the previous time made it clear what exactly is the
> proposed change, so reposting.  Info, Peter, could you ack merging the
> following through the net-next tree please?
> 
> diff --git a/kernel/sched.c b/kernel/sched.c
> index d484081..3759391 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -4744,6 +4744,7 @@ out_put_task:
>  	put_online_cpus();
>  	return retval;
>  }
> +EXPORT_SYMBOL_GPL(sched_setaffinity);
>  
>  static int get_user_cpu_mask(unsigned long __user *user_mask_ptr, unsigned len,
>  			     struct cpumask *new_mask)
> @@ -4807,6 +4808,7 @@ out_unlock:
>  
>  	return retval;
>  }
> +EXPORT_SYMBOL_GPL(sched_getaffinity);
>  
>  /**
>   * sys_sched_getaffinity - get the cpu affinity of a process

Urgh,.. so why again is that a good idea?

^ permalink raw reply

* [PATCH repost] sched: export sched_set/getaffinity to modules
From: Michael S. Tsirkin @ 2010-07-01 11:07 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: Sridhar Samudrala, Tejun Heo, Oleg Nesterov, netdev, lkml,
	kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev, Jiri Kosina,
	Thomas Gleixner, Andi Kleen
In-Reply-To: <20100625101022.GA16321@redhat.com>

Author: Sridhar Samudrala <sri@us.ibm.com>

sched: export sched_set/getaffinity to modules

vhost-net driver wants to copy the affinity from the
owner thread to thread it creates. Export
sched_set/get affinity to modules to make this possible
when vhost is built as a module.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

---

I'm not sure the previous time made it clear what exactly is the
proposed change, so reposting.  Info, Peter, could you ack merging the
following through the net-next tree please?

diff --git a/kernel/sched.c b/kernel/sched.c
index d484081..3759391 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4744,6 +4744,7 @@ out_put_task:
 	put_online_cpus();
 	return retval;
 }
+EXPORT_SYMBOL_GPL(sched_setaffinity);
 
 static int get_user_cpu_mask(unsigned long __user *user_mask_ptr, unsigned len,
 			     struct cpumask *new_mask)
@@ -4807,6 +4808,7 @@ out_unlock:
 
 	return retval;
 }
+EXPORT_SYMBOL_GPL(sched_getaffinity);
 
 /**
  * sys_sched_getaffinity - get the cpu affinity of a process

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox