* Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips
From: Jan Kiszka @ 2011-10-24 16:10 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Avi Kivity, Marcelo Tosatti, kvm
In-Reply-To: <20111024160526.GA30385@redhat.com>
On 2011-10-24 18:05, Michael S. Tsirkin wrote:
>> This is what I have in mind:
>> - devices set PBA bit if MSI message cannot be sent due to mask (*)
>> - core checks&clears PBA bit on unmask, injects message if bit was set
>> - devices clear PBA bit if message reason is resolved before unmask (*)
>
> OK, but practically, when exactly does the device clear PBA?
Consider a network adapter that signals messages in a RX ring: If the
corresponding vector is masked while the guest empties the ring, I
strongly assume that the device is supposed to take back the pending bit
in that case so that there is no interrupt inject on a later vector
unmask operation.
Jan
--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
^ permalink raw reply
* [PATCH] cache align vm_stat
From: Dimitri Sivanich @ 2011-10-24 16:10 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Andrew Morton, Christoph Lameter, David Rientjes, Andi Kleen,
Mel Gorman
Avoid false sharing of the vm_stat array.
This was found to adversely affect tmpfs I/O performance.
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
---
mm/vmstat.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -78,7 +78,7 @@ void vm_events_fold_cpu(int cpu)
*
* vm_stat contains the global counters
*/
-atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS];
+atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS] __cacheline_aligned_in_smp;
EXPORT_SYMBOL(vm_stat);
#ifdef CONFIG_SMP
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* [PATCH] cache align vm_stat
From: Dimitri Sivanich @ 2011-10-24 16:10 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Andrew Morton, Christoph Lameter, David Rientjes, Andi Kleen,
Mel Gorman
Avoid false sharing of the vm_stat array.
This was found to adversely affect tmpfs I/O performance.
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
---
mm/vmstat.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -78,7 +78,7 @@ void vm_events_fold_cpu(int cpu)
*
* vm_stat contains the global counters
*/
-atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS];
+atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS] __cacheline_aligned_in_smp;
EXPORT_SYMBOL(vm_stat);
#ifdef CONFIG_SMP
^ permalink raw reply
* [PATCH 1/1] pinctrl/sirf: fix sirfsoc_get_group_pins prototype introduce in 7e570f97
From: Jean-Christophe PLAGNIOL-VILLARD @ 2011-10-24 16:11 UTC (permalink / raw)
To: linux-arm-kernel
Signed-off-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Stephen Warren <swarren@nvidia.com>
---
drivers/pinctrl/pinmux-sirf.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/pinctrl/pinmux-sirf.c b/drivers/pinctrl/pinmux-sirf.c
index ba73523..d76cae6 100644
--- a/drivers/pinctrl/pinmux-sirf.c
+++ b/drivers/pinctrl/pinmux-sirf.c
@@ -870,7 +870,7 @@ static const char *sirfsoc_get_group_name(struct pinctrl_dev *pctldev,
static int sirfsoc_get_group_pins(struct pinctrl_dev *pctldev, unsigned selector,
const unsigned **pins,
- const unsigned *num_pins)
+ unsigned *num_pins)
{
if (selector >= ARRAY_SIZE(sirfsoc_pin_groups))
return -EINVAL;
--
1.7.7
^ permalink raw reply related
* Re: [PATCH 11/X] uprobes: x86: introduce xol_was_trapped()
From: Oleg Nesterov @ 2011-10-24 16:07 UTC (permalink / raw)
To: Srikar Dronamraju
Cc: Peter Zijlstra, Ingo Molnar, Steven Rostedt, Linux-mm,
Arnaldo Carvalho de Melo, Linus Torvalds, Jonathan Corbet,
Masami Hiramatsu, Hugh Dickins, Christoph Hellwig,
Ananth N Mavinakayanahalli, Thomas Gleixner, Andi Kleen,
Andrew Morton, Jim Keniston, Roland McGrath, LKML
In-Reply-To: <20111024145531.GB31435@linux.vnet.ibm.com>
On 10/24, Srikar Dronamraju wrote:
>
> > diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
> > index 1c30cfd..f0fbdab 100644
> > --- a/arch/x86/include/asm/uprobes.h
> > +++ b/arch/x86/include/asm/uprobes.h
> > @@ -39,6 +39,7 @@ struct uprobe_arch_info {
> >
> > struct uprobe_task_arch_info {
> > unsigned long saved_scratch_register;
> > + unsigned long saved_trap_no;
> > };
> > #else
> > struct uprobe_arch_info {};
>
>
> one nit
> I had to add saved_trap_no to #else part (i.e uprobe_arch_info ).
Yes, thanks, I didn't notice this is for X86_64 only.
And just in case, please feel free to rename/redo/whatever.
Oleg.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH 11/X] uprobes: x86: introduce xol_was_trapped()
From: Oleg Nesterov @ 2011-10-24 16:07 UTC (permalink / raw)
To: Srikar Dronamraju
Cc: Peter Zijlstra, Ingo Molnar, Steven Rostedt, Linux-mm,
Arnaldo Carvalho de Melo, Linus Torvalds, Jonathan Corbet,
Masami Hiramatsu, Hugh Dickins, Christoph Hellwig,
Ananth N Mavinakayanahalli, Thomas Gleixner, Andi Kleen,
Andrew Morton, Jim Keniston, Roland McGrath, LKML
In-Reply-To: <20111024145531.GB31435@linux.vnet.ibm.com>
On 10/24, Srikar Dronamraju wrote:
>
> > diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
> > index 1c30cfd..f0fbdab 100644
> > --- a/arch/x86/include/asm/uprobes.h
> > +++ b/arch/x86/include/asm/uprobes.h
> > @@ -39,6 +39,7 @@ struct uprobe_arch_info {
> >
> > struct uprobe_task_arch_info {
> > unsigned long saved_scratch_register;
> > + unsigned long saved_trap_no;
> > };
> > #else
> > struct uprobe_arch_info {};
>
>
> one nit
> I had to add saved_trap_no to #else part (i.e uprobe_arch_info ).
Yes, thanks, I didn't notice this is for X86_64 only.
And just in case, please feel free to rename/redo/whatever.
Oleg.
^ permalink raw reply
* Re: [PATCH 6/6] mfd: TPS65910: Improve regulator init data
From: Kyle Manna @ 2011-10-24 16:13 UTC (permalink / raw)
To: Mark Brown
Cc: linux-kernel, Samuel Ortiz, Liam Girdwood,
Jorge Eduardo Candelaria, Graeme Gregory
In-Reply-To: <20111019140027.GH18713@sirena.org.uk>
On 10/19/2011 09:00 AM, Mark Brown wrote:
> On Tue, Oct 18, 2011 at 01:26:28PM -0500, Kyle Manna wrote:
>> Improve the interface between platform code/board files to the TPS65910
> Again, *always* CC maintainers on patches.
This was an oversight on my part.
>
>> regulators. The TWL4030/6030 code was used as an example interface.
> This isn't a good sign...
I've reviewed other PMICs (ie Wolfson Micro ;) ) and will post an
updated series with an interface similar to what is used there. The new
approach makes more sense and keeps the code/patch small.
>
>> This improved interface will allow use of the regulators without
>> specifying all the constraints. Also gets rid of an assumption that
>> the platform pass in an array of correct size and was unchecked.
> You've not described the changes between the two interfaces. Note that
> empty constraints should be absolutely fine with the API.
>
>> + if (init_data->constraints.name)
>> + pmic->desc[i].name = init_data->constraints.name;
>> + else
>> + pmic->desc[i].name = info[i].name;
> No, this is broken. The name of the regulator is a fixed property of
> the device and isn't something that ought to be overridden per system.
Understood.
>
>> + /* TPS65910 and TPS65911 Regulators */
>> + rdev = add_regulator(pmic, info, TPS65910_REG_VRTC,
>> + pmic_plat_data->vrtc);
>> + if (IS_ERR(rdev))
>> + return PTR_ERR(rdev);
>> + rdev = add_regulator(pmic, info, TPS65910_REG_VIO,
>> + pmic_plat_data->vio);
>> +
>> + if (IS_ERR(rdev))
>> + return PTR_ERR(rdev);
>> +
>> + rdev = add_regulator(pmic, info, TPS65910_REG_VDD1,
>> + pmic_plat_data->vdd1);
>> + if (IS_ERR(rdev))
>> + return PTR_ERR(rdev);
> This looks like a regression - we've gone from looping over an array
> which is nice and simple to explicit code for each individual regulator
> giving us lots of repetitive code...
Will be revised.
>> -err_unregister_regulator:
>> - while (--i>= 0)
>> - regulator_unregister(pmic->rdev[i]);
>> - kfree(pmic->rdev);
> ...and loosing all our cleanup if things go wrong which isn't great
> either.
^ permalink raw reply
* Re: [PATCH 5/6] IIO:hwmon interface client driver.
From: Jonathan Cameron @ 2011-10-24 16:15 UTC (permalink / raw)
To: guenter.roeck
Cc: linux-kernel@vger.kernel.org, linux-iio@vger.kernel.org,
linus.ml.walleij@gmail.com, zdevai@gmail.com,
linux@arm.linux.org.uk, arnd@arndb.de,
broonie@opensource.wolfsonmicro.com, gregkh@suse.de,
lm-sensors@lm-sensors.org, khali@linux-fr.org,
thomas.petazzoni@free-electrons.com,
maxime.ripard@free-electrons.com
In-Reply-To: <1319472607.2583.49.camel@groeck-laptop>
On 10/24/11 17:10, Guenter Roeck wrote:
> On Mon, 2011-10-24 at 11:58 -0400, Jonathan Cameron wrote:
>> On 10/24/11 16:39, Guenter Roeck wrote:
>>> On Mon, 2011-10-24 at 06:09 -0400, Jonathan Cameron wrote:
>>> [ ... ]
>>>>>>> +/*
>>>>>>> + * Assumes that IIO and hwmon operate in the same base units.
>>>>>>> + * This is supposed to be true, but needs verification for
>>>>>>> + * new channel types.
>>>>>>> + */
>>>>>>> +static ssize_t iio_hwmon_read_val(struct device *dev,
>>>>>>> + struct device_attribute *attr,
>>>>>>> + char *buf)
>>>>>>> +{
>>>>>>> + long result;
>>>>>>> + int val, ret, scaleint, scalepart;
>>>>>>> + struct sensor_device_attribute *sattr = to_sensor_dev_attr(attr);
>>>>>>> + struct iio_hwmon_state *state = dev_get_drvdata(dev);
>>>>>>> +
>>>>>>> + /*
>>>>>>> + * No locking between this pair, so theoretically possible
>>>>>>> + * the scale has changed.
>>>>>>> + */
>>>>>>> + ret = iio_read_channel_raw(state->channels[sattr->index],
>>>>>>> + &val);
>>>>>>> + if (ret < 0)
>>>>>>> + return ret;
>>>>>>> +
>>>>>>> + ret = iio_read_channel_scale(state->channels[sattr->index],
>>>>>>> + &scaleint, &scalepart);
>>>>>>> + if (ret < 0)
>>>>>>> + return ret;
>>>>>>> + switch (ret) {
>>>>>>> + case IIO_VAL_INT:
>>>>>>> + result = val * scaleint;
>>>>>>> + break;
>>>>>>> + case IIO_VAL_INT_PLUS_MICRO:
>>>>>>> + result = (long)val * (long)scaleint +
>>>>>>> + (long)val * (long)scalepart / 1000000L;
>>>>>>> + break;
>>>>>>> + case IIO_VAL_INT_PLUS_NANO:
>>>>>>> + result = (long)val * (long)scaleint +
>>>>>>> + (long)val * (long)scalepart / 1000000000L;
>>>>>>> + break;
>>>>>>
>>>>>> Still easy to imagine that val * scalepart gets larger than 2147483647L
>>>>>> (on machines where sizeof(long) = 4) ... it will already happen if the
>>>>>> result of (val * scalepart / 1000000000) is larger than 2.
>>>>> Good point. I really ought to have done the calcs.
>>>>> If we have maximum possible value in here things will be ugly.
>>>>>
>>>>> Worst case is scalepart is 9999999999. (could be done as 1 - 0.000000001
>>>>> which would be nicer, but we don't specify a preference - from this
>>>>> discussion I am suspecting we should!)
>>>>>
>>>>> Looks like 64 bits is going to be a requirement as you say.
>>>>>>
>>>>>> What value range do you expect to see here ?
>>>>>>
>>>>>> If (val * scaleint) is already the milli-unit, scalepart would possibly
>>>>>> only address fractions of milli-units. If so, the result of (val *
>>>>>> scalepart / 1000000000L) might always be smaller than 1, ie 0.
>>>>> It certainly should be.
>>>>>> If so, for the calculation to have any value, you might be better off using
>>>>>> DIV_ROUND_CLOSEST(val * scalepart, 1000000000L).
>>>>> Good idea.
>>>>>>
>>>>>> I am a bit confused by this anyway. Since hwmon in general reports
>>>>>> milli-units, VAL_INT appears to reflect milli-units, VAL_INT_PLUS_MICRO
>>>>>> really means nano-units, and IIO_VAL_INT_PLUS_NANO really means
>>>>>> pico-units. Is this correct ?
>>>>> Micro units of the scale factor.
>>>>>
>>>>> Take my test part a max1363...
>>>>> Scale is actually 0.5 so each adc count (e.g. raw value) is 0.5millivolts.
>>>>>
>>>>> scale int here is 0,
>>>>> scale part is 500,000 (so 0.5) and it returns IIO_VAL_INT_PLUS_MICRO.
>>>>
>>>> How about the following? It'll be extremely costly, but this isn't exactly
>>>> a fast path!
>>>>
>>>> case IIO_VAL_INT_PLUS_MICRO:
>>>> result = (s64)val * (s64)scaleint +
>>>> div_s64((s64)val * (s64)scalepart, 1000000LL);
>>>> break;
>>>> case IIO_VAL_INT_PLUS_NANO:
>>>> result = (s64)val * (s64)scaleint +
>>>> div_s64((s64)val * (s64)scalepart, 1000000000LL);
>>>> break;
>>>
>>> Is div_s64 really necessary, or would
>>>
>>> result = (long)val * (long)scaleint +
>>> DIV_ROUND_CLOSEST((s64)val * (s64)scalepart,
>>> 1000000000LL);
>>>
>>> work as well ?
>> Not if you want it to compile on arm v5 by the look of it.
>>
>> ERROR: "__aeabi_ldivmod" [drivers/staging/iio/iio_hwmon.ko] undefined!
>>
> Annoying. Ok, I don't have a better idea than using div_s64. You don't
> need s64 for the first part of the operation (val * scaleint), though,
> since the result is a long.
True enough. Pretty unlikely we are going to have 2 MV hwmon devices any
time soon. I'll pop that back down to int * int I think!
^ permalink raw reply
* Re: [lm-sensors] [PATCH 5/6] IIO:hwmon interface client driver.
From: Jonathan Cameron @ 2011-10-24 16:15 UTC (permalink / raw)
To: guenter.roeck
Cc: linux-kernel@vger.kernel.org, linux-iio@vger.kernel.org,
linus.ml.walleij@gmail.com, zdevai@gmail.com,
linux@arm.linux.org.uk, arnd@arndb.de,
broonie@opensource.wolfsonmicro.com, gregkh@suse.de,
lm-sensors@lm-sensors.org, khali@linux-fr.org,
thomas.petazzoni@free-electrons.com,
maxime.ripard@free-electrons.com
In-Reply-To: <1319472607.2583.49.camel@groeck-laptop>
On 10/24/11 17:10, Guenter Roeck wrote:
> On Mon, 2011-10-24 at 11:58 -0400, Jonathan Cameron wrote:
>> On 10/24/11 16:39, Guenter Roeck wrote:
>>> On Mon, 2011-10-24 at 06:09 -0400, Jonathan Cameron wrote:
>>> [ ... ]
>>>>>>> +/*
>>>>>>> + * Assumes that IIO and hwmon operate in the same base units.
>>>>>>> + * This is supposed to be true, but needs verification for
>>>>>>> + * new channel types.
>>>>>>> + */
>>>>>>> +static ssize_t iio_hwmon_read_val(struct device *dev,
>>>>>>> + struct device_attribute *attr,
>>>>>>> + char *buf)
>>>>>>> +{
>>>>>>> + long result;
>>>>>>> + int val, ret, scaleint, scalepart;
>>>>>>> + struct sensor_device_attribute *sattr = to_sensor_dev_attr(attr);
>>>>>>> + struct iio_hwmon_state *state = dev_get_drvdata(dev);
>>>>>>> +
>>>>>>> + /*
>>>>>>> + * No locking between this pair, so theoretically possible
>>>>>>> + * the scale has changed.
>>>>>>> + */
>>>>>>> + ret = iio_read_channel_raw(state->channels[sattr->index],
>>>>>>> + &val);
>>>>>>> + if (ret < 0)
>>>>>>> + return ret;
>>>>>>> +
>>>>>>> + ret = iio_read_channel_scale(state->channels[sattr->index],
>>>>>>> + &scaleint, &scalepart);
>>>>>>> + if (ret < 0)
>>>>>>> + return ret;
>>>>>>> + switch (ret) {
>>>>>>> + case IIO_VAL_INT:
>>>>>>> + result = val * scaleint;
>>>>>>> + break;
>>>>>>> + case IIO_VAL_INT_PLUS_MICRO:
>>>>>>> + result = (long)val * (long)scaleint +
>>>>>>> + (long)val * (long)scalepart / 1000000L;
>>>>>>> + break;
>>>>>>> + case IIO_VAL_INT_PLUS_NANO:
>>>>>>> + result = (long)val * (long)scaleint +
>>>>>>> + (long)val * (long)scalepart / 1000000000L;
>>>>>>> + break;
>>>>>>
>>>>>> Still easy to imagine that val * scalepart gets larger than 2147483647L
>>>>>> (on machines where sizeof(long) = 4) ... it will already happen if the
>>>>>> result of (val * scalepart / 1000000000) is larger than 2.
>>>>> Good point. I really ought to have done the calcs.
>>>>> If we have maximum possible value in here things will be ugly.
>>>>>
>>>>> Worst case is scalepart is 9999999999. (could be done as 1 - 0.000000001
>>>>> which would be nicer, but we don't specify a preference - from this
>>>>> discussion I am suspecting we should!)
>>>>>
>>>>> Looks like 64 bits is going to be a requirement as you say.
>>>>>>
>>>>>> What value range do you expect to see here ?
>>>>>>
>>>>>> If (val * scaleint) is already the milli-unit, scalepart would possibly
>>>>>> only address fractions of milli-units. If so, the result of (val *
>>>>>> scalepart / 1000000000L) might always be smaller than 1, ie 0.
>>>>> It certainly should be.
>>>>>> If so, for the calculation to have any value, you might be better off using
>>>>>> DIV_ROUND_CLOSEST(val * scalepart, 1000000000L).
>>>>> Good idea.
>>>>>>
>>>>>> I am a bit confused by this anyway. Since hwmon in general reports
>>>>>> milli-units, VAL_INT appears to reflect milli-units, VAL_INT_PLUS_MICRO
>>>>>> really means nano-units, and IIO_VAL_INT_PLUS_NANO really means
>>>>>> pico-units. Is this correct ?
>>>>> Micro units of the scale factor.
>>>>>
>>>>> Take my test part a max1363...
>>>>> Scale is actually 0.5 so each adc count (e.g. raw value) is 0.5millivolts.
>>>>>
>>>>> scale int here is 0,
>>>>> scale part is 500,000 (so 0.5) and it returns IIO_VAL_INT_PLUS_MICRO.
>>>>
>>>> How about the following? It'll be extremely costly, but this isn't exactly
>>>> a fast path!
>>>>
>>>> case IIO_VAL_INT_PLUS_MICRO:
>>>> result = (s64)val * (s64)scaleint +
>>>> div_s64((s64)val * (s64)scalepart, 1000000LL);
>>>> break;
>>>> case IIO_VAL_INT_PLUS_NANO:
>>>> result = (s64)val * (s64)scaleint +
>>>> div_s64((s64)val * (s64)scalepart, 1000000000LL);
>>>> break;
>>>
>>> Is div_s64 really necessary, or would
>>>
>>> result = (long)val * (long)scaleint +
>>> DIV_ROUND_CLOSEST((s64)val * (s64)scalepart,
>>> 1000000000LL);
>>>
>>> work as well ?
>> Not if you want it to compile on arm v5 by the look of it.
>>
>> ERROR: "__aeabi_ldivmod" [drivers/staging/iio/iio_hwmon.ko] undefined!
>>
> Annoying. Ok, I don't have a better idea than using div_s64. You don't
> need s64 for the first part of the operation (val * scaleint), though,
> since the result is a long.
True enough. Pretty unlikely we are going to have 2 MV hwmon devices any
time soon. I'll pop that back down to int * int I think!
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors
^ permalink raw reply
* Re: [PATCH] scheduler rate controller
From: George Dunlap @ 2011-10-24 16:17 UTC (permalink / raw)
To: Lv, Hui
Cc: Duan, Jiangang, Tian, Kevin, xen-devel@lists.xensource.com,
keir@xen.org, Dong, Eddie
In-Reply-To: <C10D3FB0CD45994C8A51FEC1227CE22F340768D793@shsmsx502.ccr.corp.intel.com>
On Mon, Oct 24, 2011 at 4:36 AM, Lv, Hui <hui.lv@intel.com> wrote:
>
> As one of the topics presented in Xen summit2011 in SC, we proposed one method scheduler rate controller (SRC) to control high frequency of scheduling under some conditions. You can find the slides at
> http://www.slideshare.net/xen_com_mgr/9-hui-lvtacklingthemanagementchallengesofserverconsolidationonmulticoresystems
>
> In the followings, we have tested it with 2-socket multi-core system with many rounds and got the positive results and improve the performance greatly either with the consolidation workload SPECvirt_2010 or some small workloads such as sysbench and SPECjbb. So I posted it here for review.
>
> >From Xen scheduling mechanism, hypervisor kicks related VCPUs by raising schedule softirq during processing external interrupts. Therefore, if the number of IRQ is very large, the scheduling happens more frequent. Frequent scheduling will
> 1) bring more overhead for hypervisor and
> 2) increase cache miss rate.
>
> In our consolidation workloads, SPECvirt_sc2010, SR-IOV & iSCSI solution are adopted to bypass software emulation but bring heavy network traffic. Correspondingly, 15k scheduling happened per second on each physical core, which means the average running time is very short, only 60us. We proposed SRC in XEN to mitigate this problem.
> The performance benefits brought by this patch is very huge at peak throughput with no influence when system loads are low.
>
> SRC improved SPECvirt performance by 14%.
> 1)It reduced CPU utilization, which allows more load to be added.
> 2)Response time (QoS) became better at the same CPU %.
> 3)The better response time allowed us to push the CPU % at peak performance to an even higher level (CPU was not saturated in SPECvirt).
> SRC reduced context switch rate significantly, resulted in
> 2)Smaller Path Length
> 3)Less cache misses thus lower CPI
> 4)Better performance for both Guest and Hypervisor sides.
>
> With this patch, from our SPECvirt_sc2010 results, the performance of xen catches up the other open sourced hypervisor.
Hui,
Thanks for the patch, and the work you've done testing it. There are
a couple of things to discuss.
* I'm not sure I like the idea of doing this at the generic level than
at the specific scheduler level -- e.g., inside of credit1. For
better or for worse, all aspects of scheduling work together, and even
small changes tend to have a significant effect on the emergent
behavior. I understand why you'd want this in the generic scheduling
code; but it seems like it would be better for each scheduler to
implement a rate control independently.
* The actual algorithm you use here isn't described. It seems to be
as follows (please correct me if I've made a mistake
reverse-engineering the algorithm):
Every 10ms, check to see if there have been more than 50 schedules.
If so, disable pre-emption entirely for 10ms, allowing processes to
run without being interrupted (unless they yield).
It seems like we should be able to do better. For one, it means in
the general case you will flip back and forth between really frequent
schedules and less frequent schedules. For two, turning off
preemption entirely will mean that whatever vcpu happens to be running
could, if it wished, run for the full 10ms; and which one got elected
to do that would be really random. This may work well for SPECvirt,
but it's the kind of algorithm that is likely to have some workloads
on which it works very poorly. Finally, there's the chance that this
algorithm could be "gamed" -- i.e., if a rogue VM knew that most other
VMs yielded frequently, it might be able to arrange that there would
always be more than 50 context switches a second, while it runs
without preemption and takes up more than its fair share.
Have you tried just making it give each vcpu a minimum amount of
scheduling time, say, 500us or 1ms?
Now a couple of stylistic comments:
* src tends to make me think of "source". I think sched_rate[_*]
would fit the existing naming convention better.
* src_controller() shouldn't call continue_running() directly.
Instead, scheduler() should call src_controller(); and only call
sched->do_schedule() if src_controller() returns false (or something
like that).
* Whatever the algorithm is should have comments describing what it
does and how it's supposed to work.
* Your patch is malformed; you need to have it apply at the top level,
not from within the xen/ subdirectory. The easiest way to get a patch
is to use either mercurial queues, or "hg diff". There are some good
suggestions for making and posting patches here:
http://wiki.xensource.com/xenwiki/SubmittingXenPatches
Thanks again for all your work on this -- we definitely want Xen to
beat the other open-source hypervisor. :-)
-George
^ permalink raw reply
* Congratulation!!! Money Is Awarded To You
From: Western Union© @ 2011-10-24 15:49 UTC (permalink / raw)
--
Dear Beneficiary,
The sum of $900,000.00USD has been deposited in your name here in the
western union office by Ecowas Organisation, you are to contact Mr. Tom
Carlson to collect your money transfer control number (M.T.C.N).
Contact email: customer.service.western.un@hotmail.com
Western Union©
Customer Service
^ permalink raw reply
* Re: [PATCH 13/X] uprobes: introduce UTASK_SSTEP_TRAPPED logic
From: Oleg Nesterov @ 2011-10-24 16:13 UTC (permalink / raw)
To: Ananth N Mavinakayanahalli
Cc: Srikar Dronamraju, Peter Zijlstra, Ingo Molnar, Steven Rostedt,
Linux-mm, Arnaldo Carvalho de Melo, Linus Torvalds,
Jonathan Corbet, Masami Hiramatsu, Hugh Dickins,
Christoph Hellwig, Thomas Gleixner, Andi Kleen, Andrew Morton,
Jim Keniston, Roland McGrath, LKML
In-Reply-To: <20111024151614.GA6034@in.ibm.com>
On 10/24, Ananth N Mavinakayanahalli wrote:
>
> On Mon, Oct 24, 2011 at 04:41:27PM +0200, Oleg Nesterov wrote:
> >
> > Agreed! it would be nice to "hide" these int3's if we dump the core, but
> > I think this is a bit off-topic. It makes sense to do this in any case,
> > even if the core-dumping was triggered by another thread/insn. It makes
> > sense to remove all int3's, not only at regs->ip location. But how can
> > we do this? This is nontrivial.
>
> I don't think that is a problem.. see below...
>
> > And. Even worse. Suppose that you do "gdb probed_application". Now you
> > see int3's in the disassemble output. What can we do?
>
> In this case, nothing.
>
> > I think we can do nothing, at least currently. This just reflects the
> > fact that uprobe connects to inode, not to process/mm/etc.
> >
> > What do you think?
>
> Thinking further on this, in the normal 'running gdb on a core' case, we
> won't have this problem, as the binary that we point gdb to, will be a
> pristine one, without the uprobe int3s, right?
Not sure I understand.
I meant, if we have a binary with uprobes (iow, register_uprobe() installed
uprobes into that file), then gdb will see int3's with or without the core.
Or you can add uprobe into glibc, say you can probe getpid(). Now (again,
with or without the core) disassemble shows that getpid() starts with int3.
But I guess you meant something else...
Oleg.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH 13/X] uprobes: introduce UTASK_SSTEP_TRAPPED logic
From: Oleg Nesterov @ 2011-10-24 16:13 UTC (permalink / raw)
To: Ananth N Mavinakayanahalli
Cc: Srikar Dronamraju, Peter Zijlstra, Ingo Molnar, Steven Rostedt,
Linux-mm, Arnaldo Carvalho de Melo, Linus Torvalds,
Jonathan Corbet, Masami Hiramatsu, Hugh Dickins,
Christoph Hellwig, Thomas Gleixner, Andi Kleen, Andrew Morton,
Jim Keniston, Roland McGrath, LKML
In-Reply-To: <20111024151614.GA6034@in.ibm.com>
On 10/24, Ananth N Mavinakayanahalli wrote:
>
> On Mon, Oct 24, 2011 at 04:41:27PM +0200, Oleg Nesterov wrote:
> >
> > Agreed! it would be nice to "hide" these int3's if we dump the core, but
> > I think this is a bit off-topic. It makes sense to do this in any case,
> > even if the core-dumping was triggered by another thread/insn. It makes
> > sense to remove all int3's, not only at regs->ip location. But how can
> > we do this? This is nontrivial.
>
> I don't think that is a problem.. see below...
>
> > And. Even worse. Suppose that you do "gdb probed_application". Now you
> > see int3's in the disassemble output. What can we do?
>
> In this case, nothing.
>
> > I think we can do nothing, at least currently. This just reflects the
> > fact that uprobe connects to inode, not to process/mm/etc.
> >
> > What do you think?
>
> Thinking further on this, in the normal 'running gdb on a core' case, we
> won't have this problem, as the binary that we point gdb to, will be a
> pristine one, without the uprobe int3s, right?
Not sure I understand.
I meant, if we have a binary with uprobes (iow, register_uprobe() installed
uprobes into that file), then gdb will see int3's with or without the core.
Or you can add uprobe into glibc, say you can probe getpid(). Now (again,
with or without the core) disassemble shows that getpid() starts with int3.
But I guess you meant something else...
Oleg.
^ permalink raw reply
* Re: NFS4 client blocked (kernel 3.0.7)
From: Dilip Daya @ 2011-10-24 16:18 UTC (permalink / raw)
To: Trond Myklebust; +Cc: linux-nfs@vger.kernel.org, David Flynn
In-Reply-To: <1319449226.2785.7.camel@lade.trondhjem.org>
Embedded comments...below...
On Mon, 2011-10-24 at 09:40 +0000, Trond Myklebust wrote:
> On Sat, 2011-10-22 at 12:00 -0400, Dilip Daya wrote:
> > See below...
> >
> > 0n Sat, 2011-10-22 at 08:28 +0000, David Flynn wrote:
> > > Dear all,
> > >
> > > When mounting a solaris NFS4 export on a v3.0.4 client, we've experienced
> > > processes becoming blocked. Any further attempt to access the mountpoint
> > > from another process also blocks. Other mountpoints are unaffected.
> > > I have not identified a test case to reproduce the behaviour.
> > >
> > > Any thoughts on the matter would be most welcome,
> > >
> > > Kind regards,
> > >
> > > ..david
> > >
> > > from /proc/mounts:
> > > home:/home/ /home nfs4 rw,relatime,vers=4,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=172.29.190.20,minorversion=0,local_lock=none,addr=172.29.120.140 0 0
> > >
> > > [105121.204200] INFO: task bash:4457 blocked for more than 120 seconds.
> > > [105121.247424] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > [105121.299955] bash D ffffffff818050a0 0 4457 1 0x00000000
> > > [105121.347840] ffff8802954b5c28 0000000000000082 ffff8802954b5db8 0000000000012a40
> > > [105121.397793] ffff8802954b5fd8 0000000000012a40 ffff8802954b4000 0000000000012a40
> > > [105121.441724] 0000000000012a40 0000000000012a40 ffff8802954b5fd8 0000000000012a40
> > > [105121.441728] Call Trace:
> > > [105121.441740] [<ffffffff81110030>] ? __lock_page+0x70/0x70
> > > [105121.441744] [<ffffffff8160007c>] io_schedule+0x8c/0xd0
> > > [105121.441746] [<ffffffff8111003e>] sleep_on_page+0xe/0x20
> > > [105121.441749] [<ffffffff816008ff>] __wait_on_bit+0x5f/0x90
> > > [105121.441751] [<ffffffff81110203>] wait_on_page_bit+0x73/0x80
> > > [105121.441756] [<ffffffff81085bf0>] ? autoremove_wake_function+0x40/0x40
> > > [105121.441759] [<ffffffff8111c5e5>] ? pagevec_lookup_tag+0x25/0x40
> > > [105121.441761] [<ffffffff81110436>] filemap_fdatawait_range+0xf6/0x1a0
> > > [105121.441786] [<ffffffffa023a7d0>] ? nfs_destroy_directcache+0x20/0x20 [nfs]
> > > [105121.441789] [<ffffffff8111bae1>] ? do_writepages+0x21/0x40
> > > [105121.441791] [<ffffffff811116bb>] ? __filemap_fdatawrite_range+0x5b/0x60
> > > [105121.441793] [<ffffffff8111050b>] filemap_fdatawait+0x2b/0x30
> > > [105121.441795] [<ffffffff81112124>] filemap_write_and_wait+0x44/0x60
> > > [105121.441803] [<ffffffffa0232805>] nfs_getattr+0x105/0x120 [nfs]
> > > [105121.441806] [<ffffffff81605e88>] ? do_page_fault+0x258/0x550
> > > [105121.441810] [<ffffffff81175b31>] vfs_getattr+0x51/0x120
> > > [105121.441812] [<ffffffff81175c70>] vfs_fstatat+0x70/0x90
> > > [105121.441814] [<ffffffff81175ccb>] vfs_stat+0x1b/0x20
> > > [105121.441816] [<ffffffff81175f14>] sys_newstat+0x24/0x40
> > > [105121.441820] [<ffffffff8101449a>] ? init_fpu+0x4a/0x150
> > > [105121.441822] [<ffffffff81602955>] ? page_fault+0x25/0x30
> > > [105121.441825] [<ffffffff8160a702>] system_call_fastpath+0x16/0x1b
> > > [105121.441837] INFO: task bash:5612 blocked for more than 120 seconds.
> > > [105121.441838] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > [105121.441840] bash D 0000000000000005 0 5612 1 0x00000000
> > > [105121.441843] ffff8801f25d5ca8 0000000000000086 ffff8800163e9b08 0000000000012a40
> > > [105121.441845] ffff8801f25d5fd8 0000000000012a40 ffff8801f25d4000 0000000000012a40
> > > [105121.441848] 0000000000012a40 0000000000012a40 ffff8801f25d5fd8 0000000000012a40
> > > [105121.441850] Call Trace:
> > > [105121.441853] [<ffffffff81110030>] ? __lock_page+0x70/0x70
> > > [105121.441855] [<ffffffff8160007c>] io_schedule+0x8c/0xd0
> > > [105121.441857] [<ffffffff8111003e>] sleep_on_page+0xe/0x20
> > > [105121.441859] [<ffffffff816008ff>] __wait_on_bit+0x5f/0x90
> > > [105121.441861] [<ffffffff81110203>] wait_on_page_bit+0x73/0x80
> > > [105121.441863] [<ffffffff81085bf0>] ? autoremove_wake_function+0x40/0x40
> > > [105121.441866] [<ffffffff8111c5e5>] ? pagevec_lookup_tag+0x25/0x40
> > > [105121.441868] [<ffffffff81110436>] filemap_fdatawait_range+0xf6/0x1a0
> > > [105121.441876] [<ffffffffa023a7d0>] ? nfs_destroy_directcache+0x20/0x20 [nfs]
> > > [105121.441878] [<ffffffff8111bae1>] ? do_writepages+0x21/0x40
> > > [105121.441880] [<ffffffff811116bb>] ? __filemap_fdatawrite_range+0x5b/0x60
> > > [105121.441882] [<ffffffff81111730>] filemap_write_and_wait_range+0x70/0x80
> > > [105121.441886] [<ffffffff8119cc6a>] vfs_fsync_range+0x5a/0x90
> > > [105121.441888] [<ffffffff8119cd0c>] vfs_fsync+0x1c/0x20
> > > [105121.441894] [<ffffffffa022ec74>] nfs_file_flush+0x54/0x80 [nfs]
> > > [105121.441898] [<ffffffff8116ee7f>] filp_close+0x3f/0x90
> > > [105121.441900] [<ffffffff8116f8a7>] sys_close+0xb7/0x120
> > > [105121.441902] [<ffffffff8160a702>] system_call_fastpath+0x16/0x1b
> > > --
> >
> > Same issue!
> >
> > In my case I have NFS client & server both with Linux kernel
> > v3.0.7-stable.
> >
> >
> > Kernel: v3.0.7-stable (amd64)
> >
> > # nfsstat -m
> > /opt/xorsyst/nfs_test from 192.168.1.53:/opt/xorsyst/nfs_test
> > Flags:
> > rw,relatime,vers=4,rsize=32768,wsize=32768,namlen=255,hard,proto=udp,port=0,timeo=600,retrans=6,sec=sys,clientaddr=192.168.1.52,minorversion=0,local_lock=none,addr=192.168.1.53
>
> Sigh... Why are you using udp with timeo!=default? You do realise that
> unlike tcp, udp is a lossy protocol with no guarantee that messages will
> actually be delivered to the server?
>
> Trond
Hi Trond,
Thank you for your response. I should have provided you additional
details surrounding this issue:
Yes, I truly understand not using UDP, sorry for not providing you
additional background information earlier:
We use an in-house test-suite-tool much like LTP to test newer kernels
(v3.0.x kernel) _before_ we release them in production. We run various
tests for 96 CHO (Continuous-Hours-of-Operation). This issue was
reported in one such test using:
v3.0.7-stable kernel on both NFS client/server (x86_64) systems:
# nfsstat -m
/opt/xorsyst/nfs_test from 192.168.1.53:/opt/xorsyst/nfs_test
Flags:
rw,relatime,vers=4,rsize=32768,wsize=32768,namlen=255,hard,proto=udp,port=0,timeo=600,retrans=6,sec=sys,clientaddr=192.168.1.52,minorversion=0,local_lock=none,addr=192.168.1.53
...BUT we've had successful completion of 96 CHO using NFS/TCP (rather
than NFS/UDP) with no issues. (Not even any task "blocked for 120
seconds" nor any NFS "server not responding" messages.)
# mount -o rw,sync,proto=tcp,timeo=600,retrans=6 192.168.1.xx:/opt/xorsyst/nfs_test /opt/xorsyst/nfs_test
Note:
We've had no testing issues (NFS/UDP) with 2.6.32 based kernels which are in production at this time.
Status update:
At this time I have a system in this state, i.e.
- "df" command hangs. Show local filesystems but hangs at showing NFS mounts.
- Our tests continue now at 48 hours with only one NFS/UDP issue as reported above.
- I issued "echo 0 >/proc/sys/sunrpc/rpc_debug", unfortunately all the PID involved
in the reported backtraces no longer exist and so will have to wait for another occurrence.
=> Is there other data that I should collect?
=> Any patches patches that I could apply to v3.0.7 and retry my test?
Thanking you in advance.
-DilipD.
^ permalink raw reply
* Re: [Qemu-devel] gcc auto-omit-frame-pointer vs msvc longjmp
From: Kai Tietz @ 2011-10-24 16:18 UTC (permalink / raw)
To: Bob Breuer
Cc: xunxun, Richard Henderson, qemu-devel, Mark Cave-Ayland,
gcc@gcc.gnu.org
In-Reply-To: <4EA57A26.1050806@mc.net>
2011/10/24 Bob Breuer <breuerr@mc.net>:
> Kai Tietz wrote:
>> Hi,
>>
>> For trunk-version I have a tentative patch for this issue. On 4.6.x
>> and older branches this doesn't work, as here we can't differenciate
>> that easy between ms- and sysv-abi.
>>
>> But could somebody give this patch a try?
>>
>> Regards,
>> Kai
>>
>> ChangeLog
>>
>> * config/i386/i386.c (ix86_frame_pointer_required): Enforce use of
>> frame-pointer for 32-bit ms-abi, if setjmp is used.
>>
>> Index: i386.c
>> ===================================================================
>> --- i386.c (revision 180099)
>> +++ i386.c (working copy)
>> @@ -8391,6 +8391,10 @@
>> if (SUBTARGET_FRAME_POINTER_REQUIRED)
>> return true;
>>
>> + /* For older 32-bit runtimes setjmp requires valid frame-pointer. */
>> + if (TARGET_32BIT_MS_ABI && cfun->calls_setjmp)
>> + return true;
>> +
>> /* In ix86_option_override_internal, TARGET_OMIT_LEAF_FRAME_POINTER
>> turns off the frame pointer by default. Turn it back on now if
>> we've not got a leaf function. */
>>
>
> For a gcc 4.7 snapshot, this does fix the longjmp problem that I
> encountered. So aside from specifying -fno-omit-frame-pointer for
> affected files, what can be done for 4.6?
>
> Bob
Well, for 4.6.x (or older) we just can use the mingw32.h header in
gcc/config/i386/ and define here a subtarget-macro to indicate that.
The only incompatible point here might be for Wine using the
linux-compiler to build Windows related code.
A possible patch for 4.6 gcc versions I attached to this mail.
Regards,
Kai
Index: mingw32.h
===================================================================
--- mingw32.h (revision 180393)
+++ mingw32.h (working copy)
@@ -239,3 +239,8 @@
/* We should find a way to not have to update this manually. */
#define LIBGCJ_SONAME "libgcj" /*LIBGCC_EH_EXTN*/ "-12.dll"
+/* For 32-bit Windows we need valid frame-pointer for function using
+ setjmp. */
+#define SUBTARGET_SETJMP_NEED_FRAME_POINTER \
+ (!TARGET_64BIT && cfun->calls_setjmp)
+
Index: i386.c
===================================================================
--- i386.c (revision 180393)
+++ i386.c (working copy)
@@ -8741,6 +8741,12 @@
if (SUBTARGET_FRAME_POINTER_REQUIRED)
return true;
+#ifdef SUBTARGET_SETJMP_NEED_FRAME_POINTER
+ /* For older 32-bit runtimes setjmp requires valid frame-pointer. */
+ if (SUBTARGET_SETJMP_NEED_FRAME_POINTER)
+ return true;
+#endif
+
/* In ix86_option_override_internal, TARGET_OMIT_LEAF_FRAME_POINTER
turns off the frame pointer by default. Turn it back on now if
we've not got a leaf function. */
^ permalink raw reply
* Confidential/How are you
From: Barrister Jacque Charles @ 2011-10-24 15:21 UTC (permalink / raw)
Dearest,
My name is Barrister Jacque Charles, a personal Attorney to a late client who died in car crash without a will.
For more information please contact via email: (jcchamber@rocketmail.com) upon your response, I shall then provide you with more details and relevant documents that will help you understand this transaction well.
Kindest Regards
Barrister Jacque Charles,
^ permalink raw reply
* Re: [Qemu-devel] [PULL v3 00/13] allow tools to use the QEMU main loop
From: Anthony Liguori @ 2011-10-24 16:19 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel
In-Reply-To: <1319214405-20388-1-git-send-email-pbonzini@redhat.com>
On 10/21/2011 11:26 AM, Paolo Bonzini wrote:
> The following changes since commit c76eaf13975130768070ecd2d4f3107eb69ab757:
>
> hw/9pfs: Fix broken compilation caused by wrong trace events (2011-10-20 15:30:59 -0500)
>
> are available in the git repository at:
> git://github.com/bonzini/qemu.git split-main-loop-for-anthony
Pulled. Thanks.
Regards,
Anthony Liguori
> This patch series makes the QEMU main loop usable out of the executable,
> and especially in tools and possibly unit tests. This is cleaner because
> it avoids introducing partial transitions to GIOChannel. Interfacing with
> the glib main loop is still possible.
>
> The main loop code is currently split in cpus.c and vl.c. Moving it
> to a new file is easy; the problem is that the main loop depends on the
> timer infrastructure in qemu-timer.c, and that file currently contains
> the implementation of icount and the vm_clock. This is bad for the
> perspective of linking qemu-timer.c into the tools. Luckily, it is
> relatively easy to untie them and move them out of the way. This is
> what the largest part of the series does (patches 1-9).
>
> Patches 10-13 complete the refactoring and cleanup some surrounding
> code.
>
> v2->v3
> Rebased, added documentation
>
> v1->v2
> Rebased
>
> Paolo Bonzini (13):
> remove unused function
> qemu-timer: remove active_timers array
> qemu-timer: move common code to qemu_rearm_alarm_timer
> qemu-timer: more clock functions
> qemu-timer: move icount to cpus.c
> qemu-timer: do not refer to runstate_is_running()
> qemu-timer: use atexit for quit_timers
> qemu-timer: move more stuff out of qemu-timer.c
> qemu-timer: do not use RunState change handlers
> main-loop: create main-loop.h
> main-loop: create main-loop.c
> Revert to a hand-made select loop
> simplify main loop functions
>
> Makefile.objs | 2 +-
> async.c | 1 +
> cpus.c | 497 ++++++++++++++++++++++++++++---------------------
> cpus.h | 3 +-
> exec-all.h | 14 ++
> exec.c | 3 -
> hw/mac_dbdma.c | 5 -
> hw/mac_dbdma.h | 1 -
> iohandler.c | 55 +------
> main-loop.c | 495 ++++++++++++++++++++++++++++++++++++++++++++++++
> main-loop.h | 351 ++++++++++++++++++++++++++++++++++
> os-win32.c | 123 ------------
> qemu-char.h | 12 +-
> qemu-common.h | 37 +----
> qemu-coroutine-lock.c | 1 +
> qemu-os-posix.h | 4 -
> qemu-os-win32.h | 17 +--
> qemu-timer.c | 489 +++++++++---------------------------------------
> qemu-timer.h | 31 +---
> savevm.c | 25 +++
> slirp/libslirp.h | 11 -
> sysemu.h | 3 +-
> vl.c | 189 ++++---------------
> 23 files changed, 1309 insertions(+), 1060 deletions(-)
> create mode 100644 main-loop.c
> create mode 100644 main-loop.h
>
^ permalink raw reply
* Re: [Qemu-devel] [PULL 00/19] Block patches
From: Anthony Liguori @ 2011-10-24 16:19 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel
In-Reply-To: <1319217556-28273-1-git-send-email-kwolf@redhat.com>
On 10/21/2011 12:18 PM, Kevin Wolf wrote:
> The following changes since commit c2e2343e1faae7bbc77574c12a25881b1b696808:
>
> hw/arm_gic.c: Fix save/load of irq_target array (2011-10-21 17:19:56 +0200)
>
> are available in the git repository at:
> git://repo.or.cz/qemu/kevin.git for-anthony
Pulled. Thanks.
Regards,
Anthony Liguori
>
> Alex Jia (1):
> fix memory leak in aio_write_f
>
> Kevin Wolf (5):
> xen_disk: Always set feature-barrier = 1
> fdc: Fix floppy port I/O
> qemu-img: Don't allow preallocation and compression at the same time
> qcow2: Fix bdrv_write_compressed error handling
> pc: Fix floppy drives with if=none
>
> Paolo Bonzini (12):
> sheepdog: add coroutine_fn markers
> add socket_set_block
> block: rename bdrv_co_rw_bh
> block: unify flush implementations
> block: add bdrv_co_discard and bdrv_aio_discard support
> vmdk: fix return values of vmdk_parent_open
> vmdk: clean up open
> block: add a CoMutex to synchronous read drivers
> block: take lock around bdrv_read implementations
> block: take lock around bdrv_write implementations
> block: change flush to co_flush
> block: change discard to co_discard
>
> Stefan Hajnoczi (1):
> block: drop redundant bdrv_flush implementation
>
> block.c | 258 ++++++++++++++++++++++++++++++-------------------
> block.h | 5 +
> block/blkdebug.c | 6 -
> block/blkverify.c | 9 --
> block/bochs.c | 15 +++-
> block/cloop.c | 15 +++-
> block/cow.c | 34 ++++++-
> block/dmg.c | 15 +++-
> block/nbd.c | 28 +++++-
> block/parallels.c | 15 +++-
> block/qcow.c | 17 +---
> block/qcow2-cluster.c | 6 +-
> block/qcow2.c | 72 ++++++--------
> block/qed.c | 6 -
> block/raw-posix.c | 23 +----
> block/raw-win32.c | 4 +-
> block/raw.c | 23 ++---
> block/rbd.c | 4 +-
> block/sheepdog.c | 14 ++--
> block/vdi.c | 6 +-
> block/vmdk.c | 82 ++++++++++------
> block/vpc.c | 34 ++++++-
> block/vvfat.c | 28 +++++-
> block_int.h | 9 +-
> hw/fdc.c | 14 +++
> hw/fdc.h | 9 ++-
> hw/pc.c | 25 +++--
> hw/pc.h | 3 +-
> hw/pc_piix.c | 5 +-
> hw/xen_disk.c | 5 +-
> oslib-posix.c | 7 ++
> oslib-win32.c | 6 +
> qemu-img.c | 11 ++
> qemu-io.c | 1 +
> qemu_socket.h | 1 +
> trace-events | 1 +
> 36 files changed, 524 insertions(+), 292 deletions(-)
>
>
^ permalink raw reply
* Re: ceph on non-btrfs file systems
From: Christian Brunner @ 2011-10-24 16:22 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
In-Reply-To: <Pine.LNX.4.64.1110231739380.25255@cobra.newdream.net>
Thanks for explaining this. I don't have any objections against btrfs
as a osd filesystem. Even the fact that there is no btrfs-fsck doesn't
scare me, since I can use the ceph replication to recover a lost
btrfs-filesystem. The only problem I have is, that btrfs is not stable
on our side and I wonder what you are doing to make it work. (Maybe
it's related to the load pattern of using ceph as a backend store for
qemu).
Here is a list of the btrfs problems I'm having:
- When I run ceph with the default configuration (btrfs snaps enabled)
I can see a rapid increase in Disk-I/O after a few hours of uptime.
Btrfs-cleaner is using more and more time in
btrfs_clean_old_snapshots().
- When I run ceph with btrfs snaps disabled, the situation is getting
slightly better. I can run an OSD for about 3 days without problems,
but then again the load increases. This time, I can see that the
ceph-osd (blkdev_issue_flush) and btrfs-endio-wri are doing more work
than usual.
Another thing is that I'm seeing a WARNING: at fs/btrfs/inode.c:2114
from time to time. Maybe it's related to the performance issues, but
seems to be able to verify this.
It's really sad to see, that ceph performance and stability is
suffering that much from the underlying filesystems and that this
hasn't changed over the last months.
Kind regards,
Christian
2011/10/24 Sage Weil <sage@newdream.net>:
> Although running on ext4, xfs, or whatever other non-btrfs you want mostly
> works, there are a few important remaining issues:
>
> 1- ext4 limits total xattrs for 4KB. This can cause problems in some
> cases, as Ceph uses xattrs extensively. Most of the time we don't hit
> this. We do hit the limit with radosgw pretty easily, though, and may
> also hit it in exceptional cases where the OSD cluster is very unhealthy.
>
> There is a large xattr patch for ext4 from the Lustre folks that has been
> floating around for (I think) years. Maybe as interest grows in running
> Ceph on ext4 this can move upstream.
>
> Previously we were being forgiving about large setxattr failures on ext3,
> but we found that was leading to corruption in certain cases (because we
> couldn't set our internal metadata), so the next release will assert/crash
> in that case (fail-stop instead of fail-maybe-eventually-corrupt).
>
> XFS does not have an xattr size limit and thus does have this problem.
>
> 2- The other problem is with OSD journal replay of non-idempotent
> transactions. On non-btrfs backends, the Ceph OSDs use a write-ahead
> journal. After restart, the OSD does not know exactly which transactions
> in the journal may have already been committed to disk, and may reapply a
> transaction again during replay. For most operations (write, delete,
> truncate) this is fine.
>
> Some operations, though, are non-idempotent. The simplest example is
> CLONE, which copies (efficiently, on btrfs) data from one object to
> another. If the source object is modified, the osd restarts, and then
> the clone is replayed, the target will get incorrect (newer) data. For
> example,
>
> 1- clone A -> B
> 2- modify A
> <osd crash, replay from 1>
>
> B will get new instead of old contents.
>
> (This doesn't happen on btrfs because the snapshots allow us to replay
> from a known consistent point in time.)
>
> For things like clone, skipping the operation of the target exists almost
> works, except for cases like
>
> 1- clone A -> B
> 2- modify A
> ...
> 3- delete B
> <osd crash, replay from 1>
>
> (Although in that example who cares if B had bad data; it was removed
> anyway.) The larger problem, though, is that that doesn't always work;
> CLONERANGE copies a range of a file from A to B, where B may already
> exist.
>
> In practice, the higher level interfaces don't make full use of the
> low-level interface, so it's possible some solution exists that careful
> avoids the problem with a partial solution in the lower layer. This makes
> me nervous, though, as it is easy to break.
>
> Another possibility:
>
> - on non-btrfs, we set a xattr on every modified object with the
> op_seq, the unique sequence number for the transaction.
> - for any (potentially) non-idempotent operation, we fsync() before
> continuing to the next transaction, to ensure that xattr hits disk.
> - on replay, we skip a transaction if the xattr indicates we already
> performed this transaction.
>
> Because every 'transaction' only modifies on a single object (file),
> this ought to work. It'll make things like clone slow, but let's face it:
> they're already slow on non-btrfs file systems because they actually copy
> the data (instead of duplicating the extent refs in btrfs). And it should
> make the full ObjectStore iterface safe, without upper layers having to
> worry about the kinds and orders of transactions they perform.
>
> Other ideas?
>
> This issue is tracked at http://tracker.newdream.net/issues/213.
>
> sage
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 1/1] gst-plugins-good: correctly handle gconf schema
From: Richard Purdie @ 2011-10-24 16:18 UTC (permalink / raw)
To: Patches and discussions about the oe-core layer
In-Reply-To: <64effb846c8f9b1e6fe57a7e213d24d046439b99.1319135920.git.josh@linux.intel.com>
On Thu, 2011-10-20 at 11:40 -0700, Joshua Lock wrote:
> Add the shipped gconf schema to the gconfelements package and inherit the gconf
> class so that schema processing is handled via post* scripts.
>
> Signed-off-by: Joshua Lock <josh@linux.intel.com>
> ---
> .../gstreamer/gst-plugins-good_0.10.30.bb | 6 ++++--
> 1 files changed, 4 insertions(+), 2 deletions(-)
Merged to master, thanks.
Richard
^ permalink raw reply
* Re: Linux USB HID should ignore values outside Logical Minimum/Maximum range
From: Chris Friesen @ 2011-10-24 16:24 UTC (permalink / raw)
To: Denilson Figueiredo de Sá
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Jiri Kosina,
linux-input-u79uwXL29TY76Z2rM5mHXA,
linux-usb-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <op.v3q34la3dsdv5o@localhost>
On 10/22/2011 05:42 AM, Denilson Figueiredo de Sá wrote:
> It may even happen to send an out-of-range value for one axis, but a
> valid value for another axis. The code should be prepared for that
> (ignore one, but keep the other).
In this case what should be used for the "invalid" axis value? The
previous value?
Chris
--
Chris Friesen
Software Developer
GENBAND
chris.friesen-b7o/lNNmKxtBDgjK7y7TUQ@public.gmane.org
www.genband.com
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [Xenomai-help] address spaces of real-time task and standard linux process
From: Thomas Lockhart @ 2011-10-24 16:25 UTC (permalink / raw)
To: haitaozhumail-disc@domain.hid; +Cc: xenomai@xenomai.org
In-Reply-To: <1319401552.9644.YahooMailNeo@domain.hid>
On 10/23/2011 01:25 PM, haitaozhumail-disc@domain.hid wrote:
> Hi All,
>
> Do a standard Linux process and a real-time task (spawned by the
> standard Linux process with rt_task_create and rt_task_start ) share the
> same address space? More specifically, I have a C++ program like this:
...
> Can the function demo() correctly access the object created in main()?
> What if pA is a smart pointer defined in Boost library?
Yes, yes, and yes (though I didn't look at the actual code, the address
space is shared).
For things like smart pointers, just make sure that someone is keeping a
reference to the object so the reference count does not go to zero.
hth
- Tom
^ permalink raw reply
* Re: Linux USB HID should ignore values outside Logical Minimum/Maximum range
From: Chris Friesen @ 2011-10-24 16:24 UTC (permalink / raw)
To: Denilson Figueiredo de Sá
Cc: linux-kernel, Jiri Kosina, linux-input, linux-usb
In-Reply-To: <op.v3q34la3dsdv5o@localhost>
On 10/22/2011 05:42 AM, Denilson Figueiredo de Sá wrote:
> It may even happen to send an out-of-range value for one axis, but a
> valid value for another axis. The code should be prepared for that
> (ignore one, but keep the other).
In this case what should be used for the "invalid" axis value? The
previous value?
Chris
--
Chris Friesen
Software Developer
GENBAND
chris.friesen@genband.com
www.genband.com
^ permalink raw reply
* Re: Converting from Raid 5 to 6
From: Mathias Burén @ 2011-10-24 16:27 UTC (permalink / raw)
To: Michael Busby; +Cc: linux-raid
In-Reply-To: <CAFsPQ__G3j3CbMDJyYO7BaJrxnPi=MAZFiRfgbruzMhiCVQYag@mail.gmail.com>
On 24 October 2011 17:03, Michael Busby <michael.a.busby@gmail.com> wrote:
> should the speed be very slow when doing this progress, its a lot
> slower than a normal grow
>
> reshape = 1.2% (25006080/1953513984) finish=12481.8min speed=2574K/sec
>
> On 24 October 2011 15:11, Mathias Burén <mathias.buren@gmail.com> wrote:
>> On 24 October 2011 14:11, Michael Busby <michael.a.busby@gmail.com> wrote:
>>> At the moment i have a raid5 setup with 5 disks, i am looking to add a
>>> 6th disk and change from raid 5 to raid 6
>>>
>>> having looked at Neil's site i have found the following command, and
>>> just want to double check this is still the recommend way of
>>> converting
>>>
>>> mdadm --grow /dev/md0 --level=6 --raid-disks=6 --backup-file=/home/md.backup
>>>
>>> also would i need to add the extra disk before or after the command?
>>>
>>> cheers
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
>> Hi,
>>
>> I grew my 6 disk RAID5 to a 7 disk RAID6. First, add the drive. Then
>> partition it as required. Then add the drive to the array (I think
>> it'll become a spare?). Then you can grow it.
>>
>> Make sure you're using the latest mdadm tools available.
>>
>> Regards,
>> Mathias
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
(please reply to the bottom of the email)
What CPU are you using? What are the min/max kbps settings on the md
device? What does top (or htop) show you?
/M
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [U-Boot] [PATCH] [BUG] arm, lib: fix compile breakage
From: Albert ARIBAUD @ 2011-10-24 16:28 UTC (permalink / raw)
To: u-boot
In-Reply-To: <1319434952-10971-1-git-send-email-hs@denx.de>
Hi Heiko,
Le 24/10/2011 07:42, Heiko Schocher a ?crit :
> since commit dc8bbea0170eb2aca428ea221c91fc2e5e11f199 building
> arch/arm/lib/board.c breaks if CONFIG_CMD_NET is defined.
> Fix this.
>
> Signed-off-by: Heiko Schocher<hs@denx.de>
> Cc: Albert ARIBAUD<albert.u.boot@aribaud.net>
> Cc: Simon Glass<sjg@chromium.org>
> ---
> arch/arm/lib/board.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/arch/arm/lib/board.c b/arch/arm/lib/board.c
> index ad02dbd..c1a3f2c 100644
> --- a/arch/arm/lib/board.c
> +++ b/arch/arm/lib/board.c
> @@ -440,6 +440,9 @@ void board_init_r(gd_t *id, ulong dest_addr)
> #if !defined(CONFIG_SYS_NO_FLASH)
> ulong flash_size;
> #endif
> +#if defined(CONFIG_CMD_NET)
> + char *s;
> +#endif
>
> gd = id;
>
Applied to u-boot-arm/master, thanks.
Amicalement,
--
Albert.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.