All of lore.kernel.org

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 2/4] mfd: palmas: Reset the POWERHOLD mux during power off
From: Lee Jones @ 2016-11-09 16:14 UTC (permalink / raw)
  To: Keerthy
  Cc: tony, robh+dt, linux-omap, linux-kernel, devicetree, linux-gpio,
	nm, t-kristo
In-Reply-To: <1477559414-12520-3-git-send-email-j-keerthy@ti.com>

On Thu, 27 Oct 2016, Keerthy wrote:

> POWERHOLD signal has higher priority  over the DEV_ON bit.
> So power off will not happen if the POWERHOLD is held high.
> Hence reset the MUX to GPIO_7 mode to release the POWERHOLD
> and the DEV_ON bit to take effect to power off the PMIC.
> 
> PMIC Power off happens in dire situations like thermal shutdown
> so irrespective of the POWERHOLD setting go ahead and turn off
> the powerhold.  Currently poweroff is broken on boards that have
> powerhold enabled. This fixes poweroff on those boards.
> 
> Signed-off-by: Keerthy <j-keerthy@ti.com>
> ---
>  drivers/mfd/palmas.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/drivers/mfd/palmas.c b/drivers/mfd/palmas.c
> index 8f8bacb..8fbc5e0 100644
> --- a/drivers/mfd/palmas.c
> +++ b/drivers/mfd/palmas.c
> @@ -430,10 +430,28 @@ static void palmas_power_off(void)
>  {
>  	unsigned int addr;
>  	int ret, slave;
> +	struct device_node *node;
> +	bool override_powerhold;
>  
>  	if (!palmas_dev)

Can this happen?

>  		return;
>  
> +	node = palmas_dev->dev->of_node;

Just do:

struct device_node *np = palmas_dev->dev->of_node;

> +	override_powerhold = of_property_read_bool(node,
> +					"ti,palmas-override-powerhold");

Break the line after the '=' instead.

> +	if (override_powerhold) {

if (of_property_read_bool(node,	"ti,palmas-override-powerhold"))

Then remove 'override_powerhold'.

> +		addr = PALMAS_BASE_TO_REG(PALMAS_PU_PD_OD_BASE,
> +					  PALMAS_PRIMARY_SECONDARY_PAD2);
> +		slave = PALMAS_BASE_TO_SLAVE(PALMAS_PU_PD_OD_BASE);
> +
> +		ret = regmap_update_bits(palmas_dev->regmap[slave], addr,
> +					 PALMAS_PRIMARY_SECONDARY_PAD2_GPIO_7_MASK, 0);
> +		if (ret)
> +			pr_err("%s: Unable to write PALMAS_PRIMARY_SECONDARY_PAD2 %d\n",
> +			       __func__, ret);

Don't use __func__ in live code.

And use dev_err();


> +	}
> +
>  	slave = PALMAS_BASE_TO_SLAVE(PALMAS_PMU_CONTROL_BASE);
>  	addr = PALMAS_BASE_TO_REG(PALMAS_PMU_CONTROL_BASE, PALMAS_DEV_CTRL);
>  

-- 
Lee Jones
Linaro STMicroelectronics Landing Team Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply

* Re: [Qemu-devel] [PATCH v4 0/6] jobs: fix transactional race condition
From: Jeff Cody @ 2016-11-09 16:11 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-block, kwolf, vsementsov, stefanha, pbonzini, qemu-devel
In-Reply-To: <1478587839-9834-1-git-send-email-jsnow@redhat.com>

On Tue, Nov 08, 2016 at 01:50:33AM -0500, John Snow wrote:
> There are a few problems with transactional job completion right now.
> 
> First, if jobs complete so quickly they complete before remaining jobs
> get a chance to join the transaction, the completion mode can leave well
> known state and the QLIST can get corrupted and the transactional jobs
> can complete in batches or phases instead of all together.
> 
> Second, if two or more jobs defer to the main loop at roughly the same
> time, it's possible for one job's cleanup to directly invoke the other
> job's cleanup from within the same thread, leading to a situation that
> will deadlock the entire transaction.
> 
> Thanks to Vladimir for pointing out these modes of failure.
> 
> ===
> v4:
> ===
> 
> Key:
> [----] : patches are identical
> [####] : number of functional differences between upstream/downstream patch
> [down] : patch is downstream-only
> The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively
> 
> 001/6:[----] [--] 'blockjob: fix dead pointer in txn list'
> 002/6:[----] [--] 'blockjob: add .clean property'
> 003/6:[----] [--] 'blockjob: add .start field'
> 004/6:[0021] [FC] 'blockjob: add block_job_start'
> 005/6:[0010] [FC] 'blockjob: refactor backup_start as backup_job_create'
> 006/6:[----] [--] 'iotests: add transactional failure race test'
> 
> 04: Fix command tracers (Kevin)
>     Implement the ability to 'start' a 'paused' job (Kevin, Jeff)
> 05: Replace superfluous conditionals with assertions. (Kevin, Jeff)
>

You forgot to add my R-b's :)  I can add them when applying, though.

> ===
> v3:
> ===
> 
> - Rebase to origin/master, requisite patches now upstream.
> 
> ===
> v2:
> ===
> 
> - Correct Vladimir's email (Sorry!)
> - Add test as a variant of an existing test [Vladimir]
> 
> ________________________________________________________________________________
> 
> For convenience, this branch is available at:
> https://github.com/jnsnow/qemu.git branch job-fix-race-condition
> https://github.com/jnsnow/qemu/tree/job-fix-race-condition
> 
> This version is tagged job-fix-race-condition-v4:
> https://github.com/jnsnow/qemu/releases/tag/job-fix-race-condition-v4
> 
> John Snow (5):
>   blockjob: add .clean property
>   blockjob: add .start field
>   blockjob: add block_job_start
>   blockjob: refactor backup_start as backup_job_create
>   iotests: add transactional failure race test
> 
> Vladimir Sementsov-Ogievskiy (1):
>   blockjob: fix dead pointer in txn list
> 
>  block/backup.c               | 63 +++++++++++++++++++---------------
>  block/commit.c               |  6 ++--
>  block/mirror.c               |  7 ++--
>  block/replication.c          | 12 ++++---
>  block/stream.c               |  6 ++--
>  block/trace-events           |  6 ++--
>  blockdev.c                   | 81 ++++++++++++++++++++++++++++----------------
>  blockjob.c                   | 58 ++++++++++++++++++++++++-------
>  include/block/block_int.h    | 23 +++++++------
>  include/block/blockjob.h     |  9 +++++
>  include/block/blockjob_int.h | 11 ++++++
>  tests/qemu-iotests/124       | 53 +++++++++++++++++++----------
>  tests/qemu-iotests/124.out   |  4 +--
>  tests/test-blockjob-txn.c    | 12 +++----
>  14 files changed, 228 insertions(+), 123 deletions(-)
> 
> -- 
> 2.7.4
> 

^ permalink raw reply

* Re: [PATCH for-4.8] x86/svm: Don't clobber eax and edx if an RDMSR intercept fails
From: Boris Ostrovsky @ 2016-11-09 16:10 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel; +Cc: Wei Liu, Suravee Suthikulpanit, Jan Beulich
In-Reply-To: <1478694507-26060-1-git-send-email-andrew.cooper3@citrix.com>

On 11/09/2016 07:28 AM, Andrew Cooper wrote:
> The original code has a bug; eax and edx get unconditionally updated even when
> hvm_msr_read_intercept() doesn't return X86EMUL_OKAY.
>
> It is only by blind luck (vmce_rdmsr() eagerly initialising its msr_content
> pointer) that this isn't an information leak into guests.
>
> While fixing this bug, reduce the scope of msr_content and initialise it to 0.
> This makes it obvious that a stack leak won't occur, even if there were to be
> a buggy codepath in hvm_msr_read_intercept().
>
> Also make some non-functional improvements.  Make the insn_len calculation
> common, and reduce the quantity of explicit casting by making better use of
> the existing register names.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> CC: Wei Liu <wei.liu2@citrix.com>

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

(with or without changes suggested by Jan)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply

* Re: [PATCH net-next v2 2/7] vxlan: simplify exception handling
From: Jiri Benc @ 2016-11-09 16:10 UTC (permalink / raw)
  To: Pravin B Shelar; +Cc: netdev
In-Reply-To: <1478371557-71888-3-git-send-email-pshelar@ovn.org>

On Sat,  5 Nov 2016 11:45:52 -0700, Pravin B Shelar wrote:
> @@ -2058,7 +2059,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>  		err = vxlan_build_skb(skb, &rt->dst, sizeof(struct iphdr),
>  				      vni, md, flags, udp_sum);
>  		if (err < 0)
> -			goto xmit_tx_error;
> +			goto tx_error;

Seems you're leaking rt here?

> @@ -2117,11 +2118,9 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>  		skb_scrub_packet(skb, xnet);
>  		err = vxlan_build_skb(skb, ndst, sizeof(struct ipv6hdr),
>  				      vni, md, flags, udp_sum);
> -		if (err < 0) {
> -			dst_release(ndst);
> -			dev->stats.tx_errors++;
> -			return;
> -		}
> +		if (err < 0)
> +			goto tx_error;

And ndst here?

 Jiri

^ permalink raw reply

* Re: [PATCH 1/2] stacktrace: fix print_stack_trace printing timestamp twice
From: Andrey Ryabinin @ 2016-11-09 16:10 UTC (permalink / raw)
  To: Andrey Konovalov, Alexander Potapenko, Dmitry Vyukov, kasan-dev,
	linux-mm, linux-kernel, mingo
  Cc: kcc
In-Reply-To: <9df5bd889e1b980d84aa41e7010e622005fd0665.1478632698.git.andreyknvl@google.com>

On 11/08/2016 10:37 PM, Andrey Konovalov wrote:
> Right now print_stack_trace prints timestamp twice, the first time
> it's done by printk when printing spaces, the second - by print_ip_sym.
> As a result, stack traces in KASAN reports have double timestamps:
> [   18.822232] Allocated by task 3838:
> [   18.822232]  [   18.822232] [<ffffffff8107e236>] save_stack_trace+0x16/0x20
> [   18.822232]  [   18.822232] [<ffffffff81509bd6>] save_stack+0x46/0xd0
> [   18.822232]  [   18.822232] [<ffffffff81509e4b>] kasan_kmalloc+0xab/0xe0
> ....
> 
> Fix by calling printk only once.
> 
> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>


Right, since commit 4bcc595ccd80 ("printk: reinstate KERN_CONT for printing continuation lines")
printk requires KERN_CONT to continue log messages, and print_ip_sym() doesn't have it.

After a small nit bellow fixed:
	Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>

> ---
>  kernel/stacktrace.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/stacktrace.c b/kernel/stacktrace.c
> index b6e4c16..56f510f 100644
> --- a/kernel/stacktrace.c
> +++ b/kernel/stacktrace.c
> @@ -14,13 +14,15 @@
>  void print_stack_trace(struct stack_trace *trace, int spaces)
>  {
>  	int i;
> +	unsigned long ip;

This can be inside for loop.
>  
>  	if (WARN_ON(!trace->entries))
>  		return;
>  
>  	for (i = 0; i < trace->nr_entries; i++) {
> -		printk("%*c", 1 + spaces, ' ');
> -		print_ip_sym(trace->entries[i]);
> +		ip = trace->entries[i];
> +		printk("%*c[<%p>] %pS\n", 1 + spaces, ' ',
> +				(void *) ip, (void *) ip);
>  	}
>  }
>  EXPORT_SYMBOL_GPL(print_stack_trace);
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 1/4] Documentation: pinctrl: palmas: Add ti,palmas-powerhold-override property definition
From: Lee Jones @ 2016-11-09 16:10 UTC (permalink / raw)
  To: Keerthy
  Cc: tony, robh+dt, linux-omap, linux-kernel, devicetree, linux-gpio,
	nm, t-kristo
In-Reply-To: <1477559414-12520-2-git-send-email-j-keerthy@ti.com>

On Thu, 27 Oct 2016, Keerthy wrote:

> GPIO7 is configured in POWERHOLD mode which has higher priority
> over DEV_ON bit and keeps the PMIC supplies on even after the DEV_ON
> bit is turned off. This property enables driver to over ride the
> POWERHOLD value to GPIO7 so as to turn off the PMIC in power off
> scenarios.
> 
> Signed-off-by: Keerthy <j-keerthy@ti.com>
> ---
>  Documentation/devicetree/bindings/pinctrl/pinctrl-palmas.txt | 9 +++++++++
>  1 file changed, 9 insertions(+)

This requires a DT Ack.

> diff --git a/Documentation/devicetree/bindings/pinctrl/pinctrl-palmas.txt b/Documentation/devicetree/bindings/pinctrl/pinctrl-palmas.txt
> index caf297b..c28d4eb8 100644
> --- a/Documentation/devicetree/bindings/pinctrl/pinctrl-palmas.txt
> +++ b/Documentation/devicetree/bindings/pinctrl/pinctrl-palmas.txt
> @@ -35,6 +35,15 @@ Optional properties:
>  - ti,palmas-enable-dvfs2: Enable DVFS2. Configure pins for DVFS2 mode.
>  	Selection primary or secondary function associated to GPADC_START
>  	and SYSEN2 pin/pad for DVFS2 interface
> +- ti,palmas-override-powerhold: This is applicable for PMICs for which
> +	GPIO7 is configured in POWERHOLD mode which has higher priority
> +	over DEV_ON bit and keeps the PMIC supplies on even after the DEV_ON
> +	bit is turned off. This property enables driver to over ride the
> +	POWERHOLD value to GPIO7 so as to turn off the PMIC in power off
> +	scenarios. So for GPIO7 if ti,palmas-override-powerhold is set
> +	then the GPIO_7 field should never be muxed to anything else.
> +	It should be set to POWERHOLD by default and only in case of
> +	power off scenarios the driver will over ride the mux value.
>  
>  This binding uses the following generic properties as defined in
>  pinctrl-bindings.txt:

-- 
Lee Jones
Linaro STMicroelectronics Landing Team Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply

* Re: [PATCH 6/8] block: add scalable completion tracking of requests
From: Jens Axboe @ 2016-11-09 16:09 UTC (permalink / raw)
  To: Jan Kara; +Cc: axboe, linux-kernel, linux-fsdevel, linux-block, hch
In-Reply-To: <20161109090157.GZ32353@quack2.suse.cz>

On 11/09/2016 02:01 AM, Jan Kara wrote:
> On Tue 08-11-16 08:25:52, Jens Axboe wrote:
>> On 11/08/2016 06:30 AM, Jan Kara wrote:
>>> On Tue 01-11-16 15:08:49, Jens Axboe wrote:
>>>> For legacy block, we simply track them in the request queue. For
>>>> blk-mq, we track them on a per-sw queue basis, which we can then
>>>> sum up through the hardware queues and finally to a per device
>>>> state.
>>>>
>>>> The stats are tracked in, roughly, 0.1s interval windows.
>>>>
>>>> Add sysfs files to display the stats.
>>>>
>>>> Signed-off-by: Jens Axboe <axboe@fb.com>
>>>
>>> This patch looks mostly good to me but I have one concern: You track
>>> statistics in a fixed 134ms window, stats get cleared at the beginning of
>>> each window. Now this can interact with the writeback window and latency
>>> settings which are dynamic and settable from userspace - so if the
>>> writeback code observation window gets set larger than the stats window,
>>> things become strange since you'll likely miss quite some observations
>>> about read latencies. So I think you need to make sure stats window is
>>> always larger than writeback window. Or actually, why do you have something
>>> like stats window and don't leave clearing of statistics completely to the
>>> writeback tracking code?
>>
>> That's a good point, and there actually used to be a comment to that
>> effect in the code. I think the best solution here would be to make the
>> stats code mask available somewhere, and allow a consumer of the stats
>> to request a larger window.
>>
>> Similarly, we could make the stat window be driven by the consumer, as
>> you suggest.
>>
>> Currently there are two pending submissions that depend on the stats
>> code. One is this writeback series, and the other one is the hybrid
>> polling code. The latter does not really care about the window size as
>> such, since it has no monitoring window of its own, and it wants the
>> auto-clearing as well.
>>
>> I don't mind working on additions for this, but I'd prefer if we could
>> layer them on top of the existing series instead of respinning it.
>> There's considerable test time on the existing patchset. Would that work
>> for you? Especially collapsing the stats and wbt windows would require
>> some re-architecting.
>
> OK, that works for me. Actually, when thinking about this, I have one more
> suggestion: Do we really want to expose the wbt window as a sysfs tunable?
> I guess it is good for initial experiments but longer term having the wbt
> window length be a function of target read latency might be better.
> Generally you want the window length to be considerably larger than the
> target latency but OTOH not too large so that the algorithm can react
> reasonably quickly so that suggests it could really be autotuned (and we
> scale the window anyway to adapt it to current situation).

That's not a bad idea, I have thought about that as well before. We
don't need the window tunable, and you are right, it can be a function
of the desired latency.

I'll hardwire the 100msec latency window for now and get rid of the
exposed tunable. It's harder to remove sysfs files once they have made
it into the kernel...

>>> Also as a side note - nobody currently uses the mean value of the
>>> statistics. It may be faster to track just sum and count so that mean can
>>> be computed on request which will be presumably much more rare than current
>>> situation where we recompute the mean on each batch update. Actually, that
>>> way you could get rid of the batching as well I assume.
>>
>> That could be opt-in as well. The poll code uses it. And fwiw, it is
>> exposed through sysfs as well.
>
> Yeah, my point was that just doing the division in response to sysfs read
> or actual request to read the average is likely going to be less expensive
> than having to do it on each batch completion (actually, you seem to have
> that batching code only so that you don't have to do the division too
> often). Whether my suggestion is right depends on how often polling code
> actually needs to read the average...

The polling code currently does it for every IO... That is not ideal for
other purposes, I think I'm going to work on changing that to just keep
the previous window available, so we only need to read it when the stats
window changes.

With the batching, I don't see the division as a problem in micro
benchmarks. That's why I added the batching, because it did show up
before.

-- 
Jens Axboe

^ permalink raw reply

* Re: [PATCH 6/8] block: add scalable completion tracking of requests
From: Jens Axboe @ 2016-11-09 16:09 UTC (permalink / raw)
  To: Jan Kara; +Cc: axboe, linux-kernel, linux-fsdevel, linux-block, hch
In-Reply-To: <20161109090157.GZ32353@quack2.suse.cz>

On 11/09/2016 02:01 AM, Jan Kara wrote:
> On Tue 08-11-16 08:25:52, Jens Axboe wrote:
>> On 11/08/2016 06:30 AM, Jan Kara wrote:
>>> On Tue 01-11-16 15:08:49, Jens Axboe wrote:
>>>> For legacy block, we simply track them in the request queue. For
>>>> blk-mq, we track them on a per-sw queue basis, which we can then
>>>> sum up through the hardware queues and finally to a per device
>>>> state.
>>>>
>>>> The stats are tracked in, roughly, 0.1s interval windows.
>>>>
>>>> Add sysfs files to display the stats.
>>>>
>>>> Signed-off-by: Jens Axboe <axboe@fb.com>
>>>
>>> This patch looks mostly good to me but I have one concern: You track
>>> statistics in a fixed 134ms window, stats get cleared at the beginning of
>>> each window. Now this can interact with the writeback window and latency
>>> settings which are dynamic and settable from userspace - so if the
>>> writeback code observation window gets set larger than the stats window,
>>> things become strange since you'll likely miss quite some observations
>>> about read latencies. So I think you need to make sure stats window is
>>> always larger than writeback window. Or actually, why do you have something
>>> like stats window and don't leave clearing of statistics completely to the
>>> writeback tracking code?
>>
>> That's a good point, and there actually used to be a comment to that
>> effect in the code. I think the best solution here would be to make the
>> stats code mask available somewhere, and allow a consumer of the stats
>> to request a larger window.
>>
>> Similarly, we could make the stat window be driven by the consumer, as
>> you suggest.
>>
>> Currently there are two pending submissions that depend on the stats
>> code. One is this writeback series, and the other one is the hybrid
>> polling code. The latter does not really care about the window size as
>> such, since it has no monitoring window of its own, and it wants the
>> auto-clearing as well.
>>
>> I don't mind working on additions for this, but I'd prefer if we could
>> layer them on top of the existing series instead of respinning it.
>> There's considerable test time on the existing patchset. Would that work
>> for you? Especially collapsing the stats and wbt windows would require
>> some re-architecting.
>
> OK, that works for me. Actually, when thinking about this, I have one more
> suggestion: Do we really want to expose the wbt window as a sysfs tunable?
> I guess it is good for initial experiments but longer term having the wbt
> window length be a function of target read latency might be better.
> Generally you want the window length to be considerably larger than the
> target latency but OTOH not too large so that the algorithm can react
> reasonably quickly so that suggests it could really be autotuned (and we
> scale the window anyway to adapt it to current situation).

That's not a bad idea, I have thought about that as well before. We
don't need the window tunable, and you are right, it can be a function
of the desired latency.

I'll hardwire the 100msec latency window for now and get rid of the
exposed tunable. It's harder to remove sysfs files once they have made
it into the kernel...

>>> Also as a side note - nobody currently uses the mean value of the
>>> statistics. It may be faster to track just sum and count so that mean can
>>> be computed on request which will be presumably much more rare than current
>>> situation where we recompute the mean on each batch update. Actually, that
>>> way you could get rid of the batching as well I assume.
>>
>> That could be opt-in as well. The poll code uses it. And fwiw, it is
>> exposed through sysfs as well.
>
> Yeah, my point was that just doing the division in response to sysfs read
> or actual request to read the average is likely going to be less expensive
> than having to do it on each batch completion (actually, you seem to have
> that batching code only so that you don't have to do the division too
> often). Whether my suggestion is right depends on how often polling code
> actually needs to read the average...

The polling code currently does it for every IO... That is not ideal for
other purposes, I think I'm going to work on changing that to just keep
the previous window available, so we only need to read it when the stats
window changes.

With the batching, I don't see the division as a problem in micro
benchmarks. That's why I added the batching, because it did show up
before.

-- 
Jens Axboe

^ permalink raw reply

* Re: [PATCH 1/3] ipmi/bt-bmc: change compatible node to 'aspeed, ast2400-ibt-bmc'
From: Arnd Bergmann @ 2016-11-09 16:09 UTC (permalink / raw)
  To: minyard
  Cc: devicetree, arm, Benjamin Herrenschmidt, Rob Herring, olof,
	openipmi-developer, linux-arm-kernel, Joel Stanley
In-Reply-To: <c749ff0a-bc71-bc9f-0e8c-326db2cea0fb@acm.org>

On Tuesday, November 8, 2016 12:15:57 PM CET Corey Minyard wrote:
> On 11/08/2016 09:52 AM, Cédric Le Goater wrote:
> > O
> snip
> 
> >>>> While we're modifying the binding, should we add a compat string for
> >>>> the ast2500?
> >>> Well, if the change in this patch is fine for all, may be we can add
> >>> the ast2500 compat string in a followup patch ?
> >> Sounds good to me.
> > OK. So, how do we proceed with this patch ? Who would include it in its
> > tree ?
> 
> I don't have anything for 4.9 at the moment.  Arnd, if you have 
> something, can
> you take this?  Otherwise I will.
> 
> And I guess I should add:
> 
> Acked-by: Corey Minyard <cminyard@mvista.com>

Thanks, I've added it to my todo folder.

Olof, if you do fixes before I do, please pick up this patch with
Corey's Ack.

	Arnd

------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi

^ permalink raw reply

* [PATCH 1/3] ipmi/bt-bmc: change compatible node to 'aspeed, ast2400-ibt-bmc'
From: Arnd Bergmann @ 2016-11-09 16:09 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <c749ff0a-bc71-bc9f-0e8c-326db2cea0fb@acm.org>

On Tuesday, November 8, 2016 12:15:57 PM CET Corey Minyard wrote:
> On 11/08/2016 09:52 AM, C?dric Le Goater wrote:
> > O
> snip
> 
> >>>> While we're modifying the binding, should we add a compat string for
> >>>> the ast2500?
> >>> Well, if the change in this patch is fine for all, may be we can add
> >>> the ast2500 compat string in a followup patch ?
> >> Sounds good to me.
> > OK. So, how do we proceed with this patch ? Who would include it in its
> > tree ?
> 
> I don't have anything for 4.9 at the moment.  Arnd, if you have 
> something, can
> you take this?  Otherwise I will.
> 
> And I guess I should add:
> 
> Acked-by: Corey Minyard <cminyard@mvista.com>

Thanks, I've added it to my todo folder.

Olof, if you do fixes before I do, please pick up this patch with
Corey's Ack.

	Arnd

^ permalink raw reply

* Re: [RFC 2/2] mmc: sdhci-pci: Use ACPI to get max frequency for Intel byt sdio host controller sub-vended by NI
From: Zach Brown @ 2016-11-09 16:08 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: ulf.hansson, linux-mmc, linux-kernel
In-Reply-To: <af8a14a2-55a5-1d44-663e-53ff20408f10@intel.com>

On Wed, Nov 09, 2016 at 03:24:24PM +0200, Adrian Hunter wrote:
> On 08/11/16 22:07, Zach Brown wrote:
> > On NI 9037 boards the max SDIO frequency is limited by trace lengths
> > and other layout choices. The max SDIO frequency is stored in an ACPI
> > table, as MXFQ.
> >
> > The driver reads the ACPI entry MXFQ during sdio_probe_slot and sets the
> > f_max field of the host with it.
> >
> > Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com>
> > Reviewed-by: Jaeden Amero <jaeden.amero@ni.com>
> > Reviewed-by: Josh Cartwright <joshc@ni.com>
> > Signed-off-by: Zach Brown <zach.brown@ni.com>
> > ---
> >  drivers/mmc/host/sdhci-pci-core.c | 30 ++++++++++++++++++++++++++++++
> >  1 file changed, 30 insertions(+)
> >
> > diff --git a/drivers/mmc/host/sdhci-pci-core.c b/drivers/mmc/host/sdhci-pci-core.c
> > index c333ce2..4ac7f16 100644
> > --- a/drivers/mmc/host/sdhci-pci-core.c
> > +++ b/drivers/mmc/host/sdhci-pci-core.c
> > @@ -27,6 +27,7 @@
> >  #include <linux/pm_runtime.h>
> >  #include <linux/mmc/slot-gpio.h>
> >  #include <linux/mmc/sdhci-pci-data.h>
> > +#include <linux/acpi.h>
> >
> >  #include "sdhci.h"
> >  #include "sdhci-pci.h"
> > @@ -377,6 +378,35 @@ static int byt_emmc_probe_slot(struct sdhci_pci_slot *slot)
> >
> >  static int ni_byt_sdio_probe_slot(struct sdhci_pci_slot *slot)
> >  {
> > +#ifdef CONFIG_ACPI
> > +	/* Get max freq from ACPI for NI hardware */
> > +	acpi_handle acpi_hdl;
> > +	acpi_status status;
> > +	struct acpi_buffer acpi_result = {
> > +		ACPI_ALLOCATE_BUFFER, NULL };
> > +	union acpi_object *acpi_buffer;
> > +	int max_freq;
> > +
> > +	status = acpi_get_handle(ACPI_HANDLE(&slot->chip->pdev->dev), "MXFQ",
> > +				 &acpi_hdl);
>
> Is "MXFQ" an object that has already been deployed or are you inventing it
> now?  In the latter case, did you consider device properties as an alternative?
>
"MXFQ" is an object that we have already deployed on some of our devices.

> > +	if (ACPI_FAILURE(status))
> > +		return  -ENODEV;
> > +
> > +	status = acpi_evaluate_object(acpi_hdl, NULL,
> > +				      NULL, &acpi_result);
> > +	if (ACPI_FAILURE(status))
> > +		return -EINVAL;
> > +
> > +	acpi_buffer = (union acpi_object *)acpi_result.pointer;
> > +
> > +	if (acpi_buffer->type != ACPI_TYPE_INTEGER)
> > +		return -EINVAL;
> > +
> > +	max_freq = acpi_buffer->integer.value;
> > +
> > +	slot->host->mmc->f_max = max_freq * 1000000;
> > +#endif
> > +
> >  	slot->host->mmc->caps |= MMC_CAP_POWER_OFF_CARD | MMC_CAP_NONREMOVABLE;
> >  	return 0;
> >  }
> >
>

^ permalink raw reply

* Re: [uml-devel] [PATCH v3] UBD Improvements Phase 1
From: James McMechan @ 2016-11-09 16:08 UTC (permalink / raw)
  To: anton.ivanov@kot-begemot.co.uk,
	user-mode-linux-devel@lists.sourceforge.net
  Cc: richard@nod.at, Anton Ivanov
In-Reply-To: <1478680606-476779-1-git-send-email-anton.ivanov@kot-begemot.co.uk>

Hi

I am not clear on the remainder/remainder_size would not a single offset parameter work? rather than copying it off and back?
also max_recs does not seem to used except to calculate max buffer size so would not using a buffer size be easier?
something like bulk_req_read( int fd, struct io_thread_req *buf, size_t len, size_t *offset)

+static int bulk_req_safe_read(
+       int fd,
+       struct io_thread_req * (*request_buffer)[],
+       struct io_thread_req **remainder,
+       int *remainder_size,
+       int max_recs
+       )
+{
+       int n = 0;
+       int res = 0;
+
+       if (*remainder_size > 0) {
+               memmove(
+                       (char *) request_buffer,
+                       (char *) remainder, *remainder_size
+               );
+               n = *remainder_size;
+       }
+
+       res = os_read_file(
+                       fd,
+                       ((char *) request_buffer) + *remainder_size,
+                       sizeof(struct io_thread_req *)*max_recs
+                               - *remainder_size

then here it would be
res = os_read_file(fd, buf+*offset,len-*offset)
Note that if this is ever multithreaded buf and offset or request_buffer/remainder/remainder_size need to be kept separate

+               );
+       if (res > 0) {
+               n += res;
+               if ((n % sizeof(struct io_thread_req *)) > 0) {
+                       /*
+                       * Read somehow returned not a multiple of dword
+                       * theoretically possible, but never observed in the
+                       * wild, so read routine must be able to handle it
+                       */

maybe this should have a WARN_ON since it should never occur?

+                       *remainder_size = n % sizeof(struct io_thread_req *);
+                       memmove(
+                               remainder,
+                               ((char *) request_buffer) +
+                                       n/sizeof(struct io_thread_req *),
+                               *remainder_size

I am pretty sure the 2nd parameter is off. I think it should be:
((char *) request_buffer) +(n/sizeof(struct io_thread_req *))*sizeof(struct io_thread_req *)
also copying it off to a shared static buffer?

+                       );
+                       n = n - *remainder_size;
+               }
+       } else {
+               n = res;
+       }
+       return n;
+}
+
 /* Called without dev->lock held, and only in interrupt context. */
 static void ubd_handler(void)
 {
-       struct io_thread_req *req;
         struct ubd *ubd;
         struct list_head *list, *next_ele;
         unsigned long flags;
         int n;
+       int count;
 
         while(1){
-               n = os_read_file(thread_fd, &req,
-                                sizeof(struct io_thread_req *));
-               if(n != sizeof(req)){

if it is being rewritten n could just be the number of complete buffers for the for loop below

+               n = bulk_req_safe_read(
+                       thread_fd,
+                       irq_req_buffer,
+                       &irq_remainder,
+                       &irq_remainder_size,
+                       UBD_REQ_BUFFER_SIZE
+               );
+               if (n < 0) {
                         if(n == -EAGAIN)
                                 break;
                         printk(KERN_ERR "spurious interrupt in ubd_handler, "
                                "err = %d\n", -n);
                         return;
                 }
-
-               blk_end_request(req->req, 0, req->length);
-               kfree(req);
+               for (count = 0; count < n/sizeof(struct io_thread_req *); count++) {
+                       blk_end_request(
+                               (*irq_req_buffer)[count]->req,
+                               0,
+                               (*irq_req_buffer)[count]->length
+                       );
+                       kfree((*irq_req_buffer)[count]);
+               }
         }
         reactivate_fd(thread_fd, UBD_IRQ);
 
@@ -1064,6 +1137,28 @@ static int __init ubd_init(void)
                 if (register_blkdev(fake_major, "ubd"))
                         return -1;
         }
+
+       irq_req_buffer = kmalloc(
+                       sizeof(struct io_thread_req *) * UBD_REQ_BUFFER_SIZE,
+                       GFP_KERNEL
+               );
+       irq_remainder = 0;
+
+       if (irq_req_buffer == NULL) {
+               printk(KERN_ERR "Failed to initialize ubd buffering\n");
+               return -1;
+       }

Ok, I am really not tracking this, the same buffer allocated twice?

+       io_req_buffer = kmalloc(
+                       sizeof(struct io_thread_req *) * UBD_REQ_BUFFER_SIZE,
+                       GFP_KERNEL
+               );
+
+       io_remainder = 0;
+
+       if (io_req_buffer == NULL) {
+               printk(KERN_ERR "Failed to initialize ubd buffering\n");
+               return -1;
+       }
         platform_driver_register(&ubd_driver);
         mutex_lock(&ubd_lock);
         for (i = 0; i < MAX_DEV; i++){



Sorry of the poor email I am using a web interface and it keeps dropping chars...

Jim
------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply

* Re: [PATCH 7/8] blk-wbt: add general throttling mechanism
From: Jens Axboe @ 2016-11-09 16:07 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-kernel, linux-fsdevel, linux-block, hch
In-Reply-To: <20161109084034.GY32353@quack2.suse.cz>

On 11/09/2016 01:40 AM, Jan Kara wrote:
>>> So for devices with write cache, you will completely drain the device
>>> before waking anybody waiting to issue new requests. Isn't it too strict?
>>> In particular may_queue() will allow new writers to issue new writes once
>>> we drop below the limit so it can happen that some processes will be
>>> effectively starved waiting in may_queue?
>>
>> It is strict, and perhaps too strict. In testing, it's the only method
>> that's proven to keep the writeback caching devices in check. It will
>> round robin the writers, if we have more, which isn't necessarily a bad
>> thing. Each will get to do a burst of depth writes, then wait for a new
>> one.
>
> Well, I'm more concerned about a situation where one writer does a
> bursty write and blocks sleeping in may_queue(). Another writer
> produces a steady flow of write requests so that never causes the
> write queue to completely drain but that writer also never blocks in
> may_queue() when it starts queueing after write queue has somewhat
> drained because it never submits many requests in parallel. In such
> case the first writer would get starved AFAIU.

I see what you are saying. I can modify the logic to ensure that if we
do have a waiter, we queue up others behind it. That should get rid of
that concern.

> Also I'm not sure why such logic for devices with writeback cache is
> needed. Sure the disk is fast to accept writes but if that causes long
> read latencies, we should scale down the writeback limits so that we
> eventually end up submitting only one write request anyway -
> effectively the same thing as limit=0 - won't we?

Basically we want to avoid getting into that situation. The problem with
write caching is that it takes a while for you to notice that anything
is wrong, and when you do, you are way down in the hole. That causes the
first violations to be pretty bad.

I'm fine with playing with this logic and improving it, but I'd rather
wait for a 2nd series for that.

-- 
Jens Axboe

^ permalink raw reply

* wireshark's RPC-over-RDMA dissector
From: Chuck Lever @ 2016-11-09 16:05 UTC (permalink / raw)
  To: Linux NFS Mailing List, List Linux RDMA Mailing

Hi-

Thanks to Yan Berman, for a couple of years now we've had a basic
RPC-over-RDMA dissector in wireshark that can be used with ibdump
captures. There have been some bugs noted, but no-one has had the
cycles to dig in and address.

Recently Tom Haynes helped me set up my own wireshark build so I
could help address some of the known issues.

http://git.linux-nfs.org/?p=cel/wireshark.git;a=shortlog;h=refs/heads/rpc-rdma-fixes

Posting here for review before I take the next steps to push these
to the wireshark community. Constructive critique and other
suggestions are welcome.

The fixes so far focus on dissecting transport headers correctly.
There continue to be significant open issues with the dissector:
	• There does not appear to be any support for dissecting
	  RPC-over-RDMA on iWARP or RoCE
	• The NFS dissector does not handle portions of the XDR
	  stream that were transmitted via RDMA Read/Write
	• RPC messages conveyed via RDMA_NOMSG are not recognized
	  or dissected
	• There is no association between RDMA Reads and Writes
	  and the RPC-over-RDMA message they go with
	• A CREQ / CREP pair are needed to identify which QP
	  numbers are used for RPC-over-RDMA traffic
	• With TCP, the dissector fully outdents the RPC and NFS
	  dissection results; but with RDMA, the dissector places
	  the results in the tree under the Infiniband header
	• Not enough error detection in the dissector

--
Chuck Lever

^ permalink raw reply

* wireshark's RPC-over-RDMA dissector
From: Chuck Lever @ 2016-11-09 16:05 UTC (permalink / raw)
  To: Linux NFS Mailing List, List Linux RDMA Mailing

Hi-

Thanks to Yan Berman, for a couple of years now we've had a basic
RPC-over-RDMA dissector in wireshark that can be used with ibdump
captures. There have been some bugs noted, but no-one has had the
cycles to dig in and address.

Recently Tom Haynes helped me set up my own wireshark build so I
could help address some of the known issues.

http://git.linux-nfs.org/?p=cel/wireshark.git;a=shortlog;h=refs/heads/rpc-rdma-fixes

Posting here for review before I take the next steps to push these
to the wireshark community. Constructive critique and other
suggestions are welcome.

The fixes so far focus on dissecting transport headers correctly.
There continue to be significant open issues with the dissector:
	• There does not appear to be any support for dissecting
	  RPC-over-RDMA on iWARP or RoCE
	• The NFS dissector does not handle portions of the XDR
	  stream that were transmitted via RDMA Read/Write
	• RPC messages conveyed via RDMA_NOMSG are not recognized
	  or dissected
	• There is no association between RDMA Reads and Writes
	  and the RPC-over-RDMA message they go with
	• A CREQ / CREP pair are needed to identify which QP
	  numbers are used for RPC-over-RDMA traffic
	• With TCP, the dissector fully outdents the RPC and NFS
	  dissection results; but with RDMA, the dissector places
	  the results in the tree under the Infiniband header
	• Not enough error detection in the dissector

--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [lustre-devel] [PATCH 1/2] staging: lustre: llite: use u64 instead of loff_t in lov_object_fiemap()
From: Greg KH @ 2016-11-09 16:05 UTC (permalink / raw)
  To: lustre-devel
In-Reply-To: <B830560B-ED95-40ED-B612-39827A314B7F@intel.com>

On Tue, Nov 08, 2016 at 12:13:59PM +0000, Xu, Bobijam wrote:
> Change loff_t to u64 in lov_object_fiemap() since loff_t is a signed
> value type.
> 
> Otherwise there could be an overflow in
> drivers/staging/lustre/lustre/lov/lov_object.c:1241 lov_object_fiemap()
> warn: signed overflow undefined. 'fm_start + fm_length < fm_start'
> 
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
> ---
>  drivers/staging/lustre/lustre/lov/lov_object.c | 38 +++++++++++++-------------
>  1 file changed, 19 insertions(+), 19 deletions(-)

Why did you not cc: the maintainers of this filesystem with these
patches?
Please resend them both and do so.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH net-next v2 1/7] vxlan: avoid vlan processing in vxlan device.
From: Jiri Benc @ 2016-11-09 16:04 UTC (permalink / raw)
  To: Pravin B Shelar; +Cc: netdev
In-Reply-To: <1478371557-71888-2-git-send-email-pshelar@ovn.org>

On Sat,  5 Nov 2016 11:45:51 -0700, Pravin B Shelar wrote:
> VxLan device does not have special handling for vlan taging on egress.
> Therefore it does not make sense to expose vlan offloading feature.

Makes sense. Though it would be good to have a more detailed
description, I already started writing a reply that the patch changes
VXLAN behavior when I realized it doesn't.

If you're respinning for another reasons, please make the description
a bit more elaborate. Not worth resending just for this, though, IMHO.

Acked-by: Jiri Benc <jbenc@redhat.com>

^ permalink raw reply

* [GIT PULL] nvme fabrics patches for the next round of 4.9 rc
From: Jens Axboe @ 2016-11-09 16:04 UTC (permalink / raw)

In-Reply-To: <1478678520-26697-1-git-send-email-sagi@grimberg.me>

On 11/09/2016 01:02 AM, Sagi Grimberg wrote:
> Hey Jens,
>
> Mostly some stability fixes for fabrics:
> - some memleaks, correctness and verbosity fixes from Bart
> - fix command delivery during queue reconnect from Christoph
> - nvme rdma reconnect vs io race fix from Christoph and steve
> - nvme rdma queue cleanup on connection failure fix from Steve
> - nvme rdma target error flow and queue teardown fixes
> - nvme rdma queue size fix from Samuel
> - nvmet namespace rcu race fix from Sasha
>
> please pull from:
>
>   git://git.infradead.org/nvme-fabrics.git nvmf-4.9-rc

I'm scanning the list for anything that might NOT be appropriate this
late in the cycle, and I see a few:

     nvme/scsi: Remove set-but-not-used variables
     nvme-fabrics: Adjust source code indentation
     nvme-rdma: remove redundant define

I can pull this in for 4.10 without hesistation, but I don't want to
send a pull request for a fairly new feature with 3 of the commits being
cleanups for -rc5.

Let me know what you want to do.

-- 
Jens Axboe

^ permalink raw reply

* Re: [PATCH 2/4] usb: musb: Fix sleeping function called from invalid context for hdrc glue
From: Johan Hovold @ 2016-11-09 16:04 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Johan Hovold, Bin Liu, Boris Brezillon, Greg Kroah-Hartman,
	Andreas Kemnade, Felipe Balbi, George Cherian,
	Kishon Vijay Abraham I, Ivaylo Dimitrov, Ladislav Michl,
	Laurent Pinchart, Sergei Shtylyov,
	linux-usb-u79uwXL29TY76Z2rM5mHXA,
	linux-omap-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161109153953.GG14744@localhost>

On Wed, Nov 09, 2016 at 04:39:53PM +0100, Johan Hovold wrote:
> On Tue, Nov 08, 2016 at 06:26:07PM -0700, Tony Lindgren wrote:

> > @@ -2065,6 +2147,7 @@ musb_init_controller(struct device *dev, int nIrq, void __iomem *ctrl)
> >  	}
> >  
> >  	spin_lock_init(&musb->lock);
> > +	spin_lock_init(&musb->list_lock);
> >  	musb->board_set_power = plat->set_power;
> >  	musb->min_power = plat->min_power;
> >  	musb->ops = plat->platform_ops;
> > @@ -2556,6 +2639,7 @@ static int musb_suspend(struct device *dev)
> >  	struct musb	*musb = dev_to_musb(dev);
> >  	unsigned long	flags;
> >  
> > +	WARN_ON(!list_empty(&musb->pending_list));
> 
> And this also depends on anyone attempting to queue work having first
> gotten an RPM reference (so that driver core runtime resumes the device
> before calling suspend()).
> 
> But no, there's actually still a window were this could be false when
> work is queued while runtime resuming (and eventually is executed from
> musb_queue_resume_work()).

As you just pointed out that can actually never happen and corresponding
code can be removed from musb_queue_resume_work(), so scrap this last
paragraph too. :)

Johan
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 3/3] ipmi/bt-bmc: add a sysfs file to configure a maximum response time
From: Corey Minyard @ 2016-11-09 16:04 UTC (permalink / raw)
  To: Cédric Le Goater,
	openipmi-developer-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Joel Stanley
  Cc: Rob Herring, Arnd Bergmann, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
In-Reply-To: <aa4804b0-4084-6d34-c1db-68bdb5efb20a-Bxea+6Xhats@public.gmane.org>

On 11/09/2016 08:42 AM, Cédric Le Goater wrote:
> On 11/07/2016 07:37 PM, Corey Minyard wrote:
>> Sorry, I was at Plumbers and extra busy with other stuff.  Just getting around to reviewing this.
>>
>> On 11/02/2016 02:57 AM, Cédric Le Goater wrote:
>>> We could also use an ioctl for that purpose. sysfs seems a better
>>> approach.
>>>
>>> Signed-off-by: Cédric Le Goater <clg-Bxea+6Xhats@public.gmane.org>
>>> ---
>>>    drivers/char/ipmi/bt-bmc.c | 31 +++++++++++++++++++++++++++++++
>>>    1 file changed, 31 insertions(+)
>>>
>>> diff --git a/drivers/char/ipmi/bt-bmc.c b/drivers/char/ipmi/bt-bmc.c
>>> index e751e4a754b7..d7146f0e900e 100644
>>> --- a/drivers/char/ipmi/bt-bmc.c
>>> +++ b/drivers/char/ipmi/bt-bmc.c
>>> @@ -84,6 +84,33 @@ struct bt_bmc {
>>>      static atomic_t open_count = ATOMIC_INIT(0);
>>>    +static ssize_t bt_bmc_show_response_time(struct device *dev,
>>> +                     struct device_attribute *attr,
>>> +                     char *buf)
>>> +{
>>> +    struct bt_bmc *bt_bmc = dev_get_drvdata(dev);
>>> +
>>> +    return snprintf(buf, PAGE_SIZE - 1, "%d\n", bt_bmc->response_time);
>>> +}
>>> +
>>> +static ssize_t bt_bmc_set_response_time(struct device *dev,
>>> +                    struct device_attribute *attr,
>>> +                    const char *buf, size_t count)
>>> +{
>>> +    struct bt_bmc *bt_bmc = dev_get_drvdata(dev);
>>> +    unsigned long val;
>>> +    int err;
>>> +
>>> +    err = kstrtoul(buf, 0, &val);
>>> +    if (err)
>>> +        return err;
>>> +    bt_bmc->response_time = val;
>>> +    return count;
>>> +}
>>> +
>>> +static DEVICE_ATTR(response_time, 0644,
>>> +           bt_bmc_show_response_time, bt_bmc_set_response_time);
>>> +
>>>    static u8 bt_inb(struct bt_bmc *bt_bmc, int reg)
>>>    {
>>>        return ioread8(bt_bmc->base + reg);
>>> @@ -572,6 +599,10 @@ static int bt_bmc_probe(struct platform_device *pdev)
>>>        bt_bmc_config_irq(bt_bmc, pdev);
>>>          bt_bmc->response_time = BT_BMC_RESPONSE_TIME;
>>> +    rc = device_create_file(&pdev->dev, &dev_attr_response_time);
>>> +    if (rc)
>>> +        dev_warn(&pdev->dev, "can't create response_time file\n");
>>> +
>> You added an extra space here.
> yes. I will remove it in the next version.
>
> The patch raises a few questions on the "BT Interface Capabilities" :
>
>   - should we use sysfs or a specific ioctl to configure the driver ?

I should probably say sysfs, but really I don't care.  The hard part about
ioctls is the compat, and there shouldn't be any compat issues with this
interface.  An ioctl is probably easier, especially if you add an ioctl for
the header size thing I talked about in the previous email.

The only thing that matters to the driver is the timeout.  If you want to
make the timeout per fd, then it will have to be an ioctl.

>   - should the driver handle directly the response to the "Get BT
>     Interface Capabilities" command ?

That probably belongs in userspace.  The only reason I can think of
to have it in the driver would be so the host driver can fetch it if the
BMC userspace is down, but I can't see how that's a good idea.

Hope my wishy-washy answer helps :-).

-corey

> What is your opinion ?
>
> Thanks,
>
> C.
>
>>>          if (bt_bmc->irq) {
>>>            dev_info(dev, "Using IRQ %d\n", bt_bmc->irq);
>>

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH 3/3] ipmi/bt-bmc: add a sysfs file to configure a maximum response time
From: Corey Minyard @ 2016-11-09 16:04 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <aa4804b0-4084-6d34-c1db-68bdb5efb20a@kaod.org>

On 11/09/2016 08:42 AM, C?dric Le Goater wrote:
> On 11/07/2016 07:37 PM, Corey Minyard wrote:
>> Sorry, I was at Plumbers and extra busy with other stuff.  Just getting around to reviewing this.
>>
>> On 11/02/2016 02:57 AM, C?dric Le Goater wrote:
>>> We could also use an ioctl for that purpose. sysfs seems a better
>>> approach.
>>>
>>> Signed-off-by: C?dric Le Goater <clg@kaod.org>
>>> ---
>>>    drivers/char/ipmi/bt-bmc.c | 31 +++++++++++++++++++++++++++++++
>>>    1 file changed, 31 insertions(+)
>>>
>>> diff --git a/drivers/char/ipmi/bt-bmc.c b/drivers/char/ipmi/bt-bmc.c
>>> index e751e4a754b7..d7146f0e900e 100644
>>> --- a/drivers/char/ipmi/bt-bmc.c
>>> +++ b/drivers/char/ipmi/bt-bmc.c
>>> @@ -84,6 +84,33 @@ struct bt_bmc {
>>>      static atomic_t open_count = ATOMIC_INIT(0);
>>>    +static ssize_t bt_bmc_show_response_time(struct device *dev,
>>> +                     struct device_attribute *attr,
>>> +                     char *buf)
>>> +{
>>> +    struct bt_bmc *bt_bmc = dev_get_drvdata(dev);
>>> +
>>> +    return snprintf(buf, PAGE_SIZE - 1, "%d\n", bt_bmc->response_time);
>>> +}
>>> +
>>> +static ssize_t bt_bmc_set_response_time(struct device *dev,
>>> +                    struct device_attribute *attr,
>>> +                    const char *buf, size_t count)
>>> +{
>>> +    struct bt_bmc *bt_bmc = dev_get_drvdata(dev);
>>> +    unsigned long val;
>>> +    int err;
>>> +
>>> +    err = kstrtoul(buf, 0, &val);
>>> +    if (err)
>>> +        return err;
>>> +    bt_bmc->response_time = val;
>>> +    return count;
>>> +}
>>> +
>>> +static DEVICE_ATTR(response_time, 0644,
>>> +           bt_bmc_show_response_time, bt_bmc_set_response_time);
>>> +
>>>    static u8 bt_inb(struct bt_bmc *bt_bmc, int reg)
>>>    {
>>>        return ioread8(bt_bmc->base + reg);
>>> @@ -572,6 +599,10 @@ static int bt_bmc_probe(struct platform_device *pdev)
>>>        bt_bmc_config_irq(bt_bmc, pdev);
>>>          bt_bmc->response_time = BT_BMC_RESPONSE_TIME;
>>> +    rc = device_create_file(&pdev->dev, &dev_attr_response_time);
>>> +    if (rc)
>>> +        dev_warn(&pdev->dev, "can't create response_time file\n");
>>> +
>> You added an extra space here.
> yes. I will remove it in the next version.
>
> The patch raises a few questions on the "BT Interface Capabilities" :
>
>   - should we use sysfs or a specific ioctl to configure the driver ?

I should probably say sysfs, but really I don't care.  The hard part about
ioctls is the compat, and there shouldn't be any compat issues with this
interface.  An ioctl is probably easier, especially if you add an ioctl for
the header size thing I talked about in the previous email.

The only thing that matters to the driver is the timeout.  If you want to
make the timeout per fd, then it will have to be an ioctl.

>   - should the driver handle directly the response to the "Get BT
>     Interface Capabilities" command ?

That probably belongs in userspace.  The only reason I can think of
to have it in the driver would be so the host driver can fetch it if the
BMC userspace is down, but I can't see how that's a good idea.

Hope my wishy-washy answer helps :-).

-corey

> What is your opinion ?
>
> Thanks,
>
> C.
>
>>>          if (bt_bmc->irq) {
>>>            dev_info(dev, "Using IRQ %d\n", bt_bmc->irq);
>>

^ permalink raw reply

* [lustre-devel] [PATCH v3] staging: lustre: mdc: manage number of modify RPCs in flight
From: Greg Kroah-Hartman @ 2016-11-09 16:03 UTC (permalink / raw)
  To: James Simmons
  Cc: devel, Andreas Dilger, Oleg Drokin, Gregoire Pichon,
	Linux Kernel Mailing List, Lustre Development List
In-Reply-To: <1478557403-31072-1-git-send-email-jsimmons@infradead.org>

On Mon, Nov 07, 2016 at 05:23:23PM -0500, James Simmons wrote:
> From: Gregoire Pichon <gregoire.pichon@bull.net>
> 
> This patch is the main client part of a new feature that supports
> multiple modify metadata RPCs in parallel. Its goal is to improve
> metadata operations performance of a single client, while maintening
> the consistency of MDT reply reconstruction and MDT recovery
> mechanisms.
> 
> It allows to manage the number of modify RPCs in flight within
> the client obd structure and to assign a virtual index (the tag) to
> each modify RPC to help server side cleaning of reply data.
> 
> The mdc component uses this feature to send multiple modify RPCs
> in parallel.
> 
> Changelog:
> 
> v1) Initial patch with incorrect print out of timestamp in
>     obd_mod_rpc_stats_seq_show.
> 
> v2) Fixed up obd_mod_rpc_stats_seq_show to print out in nanoseconds
> 
> v3) Add this changelog

changelog goes below the --- line, you don't want that in the final
commit, right?

Please fix up and resend.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH] x86/cpuid: Deal with broken firmware once more
From: Peter Zijlstra @ 2016-11-09 16:03 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Charles (Chas) Williams, Sebastian Andrzej Siewior, x86, LKML,
	M. Vefa Bicakci, Borislav Petkov
In-Reply-To: <alpine.DEB.2.20.1611091613540.3501@nanos>

On Wed, Nov 09, 2016 at 04:35:51PM +0100, Thomas Gleixner wrote:
> Both ACPI and MP specifications require that the APIC id in the respective
> tables must be the same as the APIC id in CPUID. 
> 
> The kernel retrieves the physical package id from the APIC id during the
> ACPI/MP table scan and builds the physical to logical package map.
> 
> There exist Virtualbox and Xen implementations which violate the spec. As a

ISTR it was VMware, not VirtualBox, but whatever.. they're both crazy
virt stuff.

>  /*
> + * The physical to logical package id mapping is initialized from the
> + * acpi/mptables information. Make sure that CPUID actually agrees with
> + * that.
> + */
> +static void sanitize_package_id(struct cpuinfo_x86 *c)
> +{
> +#ifdef CONFIG_SMP
> +	unsigned int pkg, apicid, cpu = smp_processor_id();
> +
> +	apicid = apic->cpu_present_to_apicid(cpu);
> +	pkg = apicid >> boot_cpu_data.x86_coreid_bits;
> +
> +	if (apicid != c->initial_apicid) {
> +		pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x CPUID: %x\n",
> +		       cpu, apicid, c->initial_apicid);

Should we not also 'fix' c->initial_apicid ?

> +	}
> +	if (pkg != c->phys_proc_id) {
> +		pr_err(FW_BUG "CPU%u: Using firmware package id %u instead of %u\n",
> +		       cpu, pkg, c->phys_proc_id);
> +		c->phys_proc_id = pkg;
> +	}
> +	c->logical_proc_id = topology_phys_to_logical_pkg(pkg);
> +#else
> +	c->locical_proc_id = 0;

UP FTW ;-)

> +#endif
> +}

^ permalink raw reply

* RE: [PATCH 4/5] drm/amdgpu: refine uvd 6.0 clock gate feature.
From: Deucher, Alexander @ 2016-11-09 16:03 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org; +Cc: Zhu, Rex
In-Reply-To: <1478677305-12579-4-git-send-email-Rex.Zhu-5C7GfCeVMHo@public.gmane.org>

> -----Original Message-----
> From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf
> Of Rex Zhu
> Sent: Wednesday, November 09, 2016 2:42 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Zhu, Rex
> Subject: [PATCH 4/5] drm/amdgpu: refine uvd 6.0 clock gate feature.
> 
> Change-Id: I3b665f26689dd35750e1a6521cd5fac5456f7556
> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>

Can someone make sure this doesn't regress CZ and ST?  As long as they are still ok, the patches 4, 5 are:
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 112
> ++++++++++++++++++++++++++++------
>  1 file changed, 92 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
> b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
> index 00fad69..c697a73 100644
> --- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
> @@ -42,6 +42,10 @@ static void uvd_v6_0_set_irq_funcs(struct
> amdgpu_device *adev);
>  static int uvd_v6_0_start(struct amdgpu_device *adev);
>  static void uvd_v6_0_stop(struct amdgpu_device *adev);
>  static void uvd_v6_0_set_sw_clock_gating(struct amdgpu_device *adev);
> +static int uvd_v6_0_set_clockgating_state(void *handle,
> +					  enum amd_clockgating_state state);
> +static void uvd_v6_0_enable_mgcg(struct amdgpu_device *adev,
> +				 bool enable);
> 
>  /**
>   * uvd_v6_0_ring_get_rptr - get read pointer
> @@ -151,8 +155,6 @@ static int uvd_v6_0_hw_init(void *handle)
>  	uint32_t tmp;
>  	int r;
> 
> -	amdgpu_asic_set_uvd_clocks(adev, 10000, 10000);
> -
>  	r = uvd_v6_0_start(adev);
>  	if (r)
>  		goto done;
> @@ -395,11 +397,11 @@ static int uvd_v6_0_start(struct amdgpu_device
> *adev)
>  	lmi_swap_cntl = 0;
>  	mp_swap_cntl = 0;
> 
> +	amdgpu_asic_set_uvd_clocks(adev, 10000, 10000);
> +	uvd_v6_0_set_clockgating_state(adev, AMD_CG_STATE_UNGATE);
> +	uvd_v6_0_enable_mgcg(adev, true);
>  	uvd_v6_0_mc_resume(adev);
> 
> -	/* disable clock gating */
> -	WREG32_FIELD(UVD_CGC_CTRL, DYN_CLOCK_MODE, 0);
> -
>  	/* disable interupt */
>  	WREG32_FIELD(UVD_MASTINT_EN, VCPU_EN, 0);
> 
> @@ -838,22 +840,69 @@ static int uvd_v6_0_process_interrupt(struct
> amdgpu_device *adev,
>  	return 0;
>  }
> 
> +static void uvd_v6_0_enable_clock_gating(struct amdgpu_device *adev,
> bool enable)
> +{
> +	uint32_t data1, data3;
> +
> +	data1 = RREG32(mmUVD_SUVD_CGC_GATE);
> +	data3 = RREG32(mmUVD_CGC_GATE);
> +
> +	data1 |= UVD_SUVD_CGC_GATE__SRE_MASK |
> +		     UVD_SUVD_CGC_GATE__SIT_MASK |
> +		     UVD_SUVD_CGC_GATE__SMP_MASK |
> +		     UVD_SUVD_CGC_GATE__SCM_MASK |
> +		     UVD_SUVD_CGC_GATE__SDB_MASK |
> +		     UVD_SUVD_CGC_GATE__SRE_H264_MASK |
> +		     UVD_SUVD_CGC_GATE__SRE_HEVC_MASK |
> +		     UVD_SUVD_CGC_GATE__SIT_H264_MASK |
> +		     UVD_SUVD_CGC_GATE__SIT_HEVC_MASK |
> +		     UVD_SUVD_CGC_GATE__SCM_H264_MASK |
> +		     UVD_SUVD_CGC_GATE__SCM_HEVC_MASK |
> +		     UVD_SUVD_CGC_GATE__SDB_H264_MASK |
> +		     UVD_SUVD_CGC_GATE__SDB_HEVC_MASK;
> +
> +	if (enable) {
> +		data3 |= (UVD_CGC_GATE__SYS_MASK       |
> +			UVD_CGC_GATE__UDEC_MASK      |
> +			UVD_CGC_GATE__MPEG2_MASK     |
> +			UVD_CGC_GATE__RBC_MASK       |
> +			UVD_CGC_GATE__LMI_MC_MASK    |
> +			UVD_CGC_GATE__LMI_UMC_MASK   |
> +			UVD_CGC_GATE__IDCT_MASK      |
> +			UVD_CGC_GATE__MPRD_MASK      |
> +			UVD_CGC_GATE__MPC_MASK       |
> +			UVD_CGC_GATE__LBSI_MASK      |
> +			UVD_CGC_GATE__LRBBM_MASK     |
> +			UVD_CGC_GATE__UDEC_RE_MASK   |
> +			UVD_CGC_GATE__UDEC_CM_MASK   |
> +			UVD_CGC_GATE__UDEC_IT_MASK   |
> +			UVD_CGC_GATE__UDEC_DB_MASK   |
> +			UVD_CGC_GATE__UDEC_MP_MASK   |
> +			UVD_CGC_GATE__WCB_MASK       |
> +			UVD_CGC_GATE__VCPU_MASK      |
> +			UVD_CGC_GATE__JPEG_MASK      |
> +			UVD_CGC_GATE__SCPU_MASK      |
> +			UVD_CGC_GATE__JPEG2_MASK);
> +		data3 &= ~UVD_CGC_GATE__REGS_MASK;
> +	} else {
> +		data3 = 0;
> +	}
> +
> +	WREG32(mmUVD_SUVD_CGC_GATE, data1);
> +	WREG32(mmUVD_CGC_GATE, data3);
> +}
> +
>  static void uvd_v6_0_set_sw_clock_gating(struct amdgpu_device *adev)
>  {
> -	uint32_t data, data1, data2, suvd_flags;
> +	uint32_t data, data2;
> 
>  	data = RREG32(mmUVD_CGC_CTRL);
> -	data1 = RREG32(mmUVD_SUVD_CGC_GATE);
>  	data2 = RREG32(mmUVD_SUVD_CGC_CTRL);
> 
> +
>  	data &= ~(UVD_CGC_CTRL__CLK_OFF_DELAY_MASK |
>  		  UVD_CGC_CTRL__CLK_GATE_DLY_TIMER_MASK);
> 
> -	suvd_flags = UVD_SUVD_CGC_GATE__SRE_MASK |
> -		     UVD_SUVD_CGC_GATE__SIT_MASK |
> -		     UVD_SUVD_CGC_GATE__SMP_MASK |
> -		     UVD_SUVD_CGC_GATE__SCM_MASK |
> -		     UVD_SUVD_CGC_GATE__SDB_MASK;
> 
>  	data |= UVD_CGC_CTRL__DYN_CLOCK_MODE_MASK |
>  		(1 << REG_FIELD_SHIFT(UVD_CGC_CTRL,
> CLK_GATE_DLY_TIMER)) |
> @@ -886,11 +935,8 @@ static void uvd_v6_0_set_sw_clock_gating(struct
> amdgpu_device *adev)
>  			UVD_SUVD_CGC_CTRL__SMP_MODE_MASK |
>  			UVD_SUVD_CGC_CTRL__SCM_MODE_MASK |
>  			UVD_SUVD_CGC_CTRL__SDB_MODE_MASK);
> -	data1 |= suvd_flags;
> 
>  	WREG32(mmUVD_CGC_CTRL, data);
> -	WREG32(mmUVD_CGC_GATE, 0);
> -	WREG32(mmUVD_SUVD_CGC_GATE, data1);
>  	WREG32(mmUVD_SUVD_CGC_CTRL, data2);
>  }
> 
> @@ -937,6 +983,32 @@ static void uvd_v6_0_set_hw_clock_gating(struct
> amdgpu_device *adev)
>  }
>  #endif
> 
> +static void uvd_v6_0_enable_mgcg(struct amdgpu_device *adev,
> +				 bool enable)
> +{
> +	u32 orig, data;
> +
> +	if (enable && (adev->cg_flags & AMD_CG_SUPPORT_UVD_MGCG))
> {
> +		data = RREG32_UVD_CTX(ixUVD_CGC_MEM_CTRL);
> +		data |= 0xfff;
> +		WREG32_UVD_CTX(ixUVD_CGC_MEM_CTRL, data);
> +
> +		orig = data = RREG32(mmUVD_CGC_CTRL);
> +		data |= UVD_CGC_CTRL__DYN_CLOCK_MODE_MASK;
> +		if (orig != data)
> +			WREG32(mmUVD_CGC_CTRL, data);
> +	} else {
> +		data = RREG32_UVD_CTX(ixUVD_CGC_MEM_CTRL);
> +		data &= ~0xfff;
> +		WREG32_UVD_CTX(ixUVD_CGC_MEM_CTRL, data);
> +
> +		orig = data = RREG32(mmUVD_CGC_CTRL);
> +		data &= ~UVD_CGC_CTRL__DYN_CLOCK_MODE_MASK;
> +		if (orig != data)
> +			WREG32(mmUVD_CGC_CTRL, data);
> +	}
> +}
> +
>  static int uvd_v6_0_set_clockgating_state(void *handle,
>  					  enum amd_clockgating_state state)
>  {
> @@ -947,17 +1019,17 @@ static int uvd_v6_0_set_clockgating_state(void
> *handle,
>  		return 0;
> 
>  	if (enable) {
> -		/* disable HW gating and enable Sw gating */
> -		uvd_v6_0_set_sw_clock_gating(adev);
> -	} else {
>  		/* wait for STATUS to clear */
>  		if (uvd_v6_0_wait_for_idle(handle))
>  			return -EBUSY;
> -
> +		uvd_v6_0_enable_clock_gating(adev, true);
>  		/* enable HW gates because UVD is idle */
>  /*		uvd_v6_0_set_hw_clock_gating(adev); */
> +	} else {
> +		/* disable HW gating and enable Sw gating */
> +		uvd_v6_0_enable_clock_gating(adev, false);
>  	}
> -
> +	uvd_v6_0_set_sw_clock_gating(adev);
>  	return 0;
>  }
> 
> --
> 1.9.1
> 
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply

* Re: [PATCH net-next] ibmveth: v1 calculate correct gso_size and set gso_type
From: Brian King @ 2016-11-09 16:02 UTC (permalink / raw)
  To: Jonathan Maxwell
  Cc: Eric Dumazet, Thomas Falcon, Jon Maxwell, hofrat, linux-kernel,
	jarod, netdev, paulus, Tom Herbert, Marcelo Ricardo Leitner,
	linuxppc-dev, David Miller
In-Reply-To: <CAGHK07Cvh2QnGrUg5eu_aC=cV7DaL9wJm8adZ0rv4s0DmF-tzw@mail.gmail.com>

On 11/06/2016 03:22 PM, Jonathan Maxwell wrote:
> On Thu, Nov 3, 2016 at 8:40 AM, Brian King <brking@linux.vnet.ibm.com> wrote:
>> On 10/27/2016 10:26 AM, Eric Dumazet wrote:
>>> On Wed, 2016-10-26 at 11:09 +1100, Jon Maxwell wrote:
>>>> We recently encountered a bug where a few customers using ibmveth on the
>>>> same LPAR hit an issue where a TCP session hung when large receive was
>>>> enabled. Closer analysis revealed that the session was stuck because the
>>>> one side was advertising a zero window repeatedly.
>>>>
>>>> We narrowed this down to the fact the ibmveth driver did not set gso_size
>>>> which is translated by TCP into the MSS later up the stack. The MSS is
>>>> used to calculate the TCP window size and as that was abnormally large,
>>>> it was calculating a zero window, even although the sockets receive buffer
>>>> was completely empty.
>>>>
>>>> We were able to reproduce this and worked with IBM to fix this. Thanks Tom
>>>> and Marcelo for all your help and review on this.
>>>>
>>>> The patch fixes both our internal reproduction tests and our customers tests.
>>>>
>>>> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
>>>> ---
>>>>  drivers/net/ethernet/ibm/ibmveth.c | 20 ++++++++++++++++++++
>>>>  1 file changed, 20 insertions(+)
>>>>
>>>> diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
>>>> index 29c05d0..c51717e 100644
>>>> --- a/drivers/net/ethernet/ibm/ibmveth.c
>>>> +++ b/drivers/net/ethernet/ibm/ibmveth.c
>>>> @@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>>>>      int frames_processed = 0;
>>>>      unsigned long lpar_rc;
>>>>      struct iphdr *iph;
>>>> +    bool large_packet = 0;
>>>> +    u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr);
>>>>
>>>>  restart_poll:
>>>>      while (frames_processed < budget) {
>>>> @@ -1236,10 +1238,28 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>>>>                                              iph->check = 0;
>>>>                                              iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);
>>>>                                              adapter->rx_large_packets++;
>>>> +                                            large_packet = 1;
>>>>                                      }
>>>>                              }
>>>>                      }
>>>>
>>>> +                    if (skb->len > netdev->mtu) {
>>>> +                            iph = (struct iphdr *)skb->data;
>>>> +                            if (be16_to_cpu(skb->protocol) == ETH_P_IP &&
>>>> +                                iph->protocol == IPPROTO_TCP) {
>>>> +                                    hdr_len += sizeof(struct iphdr);
>>>> +                                    skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
>>>> +                                    skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
>>>> +                            } else if (be16_to_cpu(skb->protocol) == ETH_P_IPV6 &&
>>>> +                                       iph->protocol == IPPROTO_TCP) {
>>>> +                                    hdr_len += sizeof(struct ipv6hdr);
>>>> +                                    skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
>>>> +                                    skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
>>>> +                            }
>>>> +                            if (!large_packet)
>>>> +                                    adapter->rx_large_packets++;
>>>> +                    }
>>>> +
>>>>
>>>
>>> This might break forwarding and PMTU discovery.
>>>
>>> You force gso_size to device mtu, regardless of real MSS used by the TCP
>>> sender.
>>>
>>> Don't you have the MSS provided in RX descriptor, instead of guessing
>>> the value ?
>>
>> We've had some further discussions on this with the Virtual I/O Server (VIOS)
>> development team. The large receive aggregation in the VIOS (AIX based) is actually
>> being done by software in the VIOS. What they may be able to do is when performing
>> this aggregation, they could look at the packet lengths of all the packets being
>> aggregated and take the largest packet size within the aggregation unit, minus the
>> header length and return that to the virtual ethernet client which we could then stuff
>> into gso_size. They are currently assessing how feasible this would be to do and whether
>> it would impact other bits of the code. However, assuming this does end up being an option,
>> would this address the concerns here or is that going to break something else I'm
>> not thinking of?
> 
> I was discussing this with a colleague and although this is better than
> what we have so far. We wonder if there could be a corner case where
> it ends up with a smaller value than the current MSS. For example if
> the application sent a burst of small TCP packets with the PUSH
> bit set. In that case they may not be coalesced by GRO. The VIOS could

Would that be a big problem though? Other than a performance degradation
in this specific case, do you see a functional issue with this approach?

> probably be coded to detect that condition and use the previous MSS.
> But that may not necessarily be the current MSS.
> 
> The ibmveth driver passes the MSS via gso_size to the VIOS. Either as the
> 3rd argument of ibmveth_send() or via tcp_hdr(skb)->check which is
> presumably over-written when the VIOS does the TSO. Would it be possible
> to keep a copy of this value on the TX side on the VIOS before it over-written
> and then some how pass that up to the RX side along with frame and set
> gso_size to that which should be the current MSS?

That seems like it might be more difficult. Wouldn't that require the VIOS
to track the MSS on a per connection basis, since the MSS might differ
based on the source / destination? I discussed with VIOS development, and
they don't do any connection tracking where they'd be able to insert this.

Unless there is a functional issue with the approach to have the VIOS insert
the MSS based on the size of the packets received off the wire, I think that
might be our best option...

Thanks,

Brian



> 
> Regards
> 
> Jon
> 
>>
>> Unfortunately, I don't think we'd have a good way to get gso_segs set correctly as I don't
>> see how that would get passed back up the interface.
>>
>> Thanks,
>>
>> Brian
>>
>>
>> --
>> Brian King
>> Power Linux I/O
>> IBM Linux Technology Center
>>
> 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.