* Re: [PATCH 1/3] libceph: move linger requests sooner in kick_requests()
From: Sage Weil @ 2012-12-28 0:54 UTC (permalink / raw)
To: Alex Elder; +Cc: ceph-devel
In-Reply-To: <50DCD706.9030601@inktank.com>
On Thu, 27 Dec 2012, Alex Elder wrote:
> The kick_requests() function is called by ceph_osdc_handle_map()
> when an osd map change has been indicated. Its purpose is to
> re-queue any request whose target osd is different from what it
> was when it was originally sent.
>
> It is structured as two loops, one for incomplete but registered
> requests, and a second for handling completed linger requests.
> As a special case, in the first loop if a request marked to linger
> has not yet completed, it is moved from the request list to the
> linger list. This is as a quick and dirty way to have the second
> loop handle sending the request along with all the other linger
> requests.
>
> Because of the way it's done now, however, this quick and dirty
> solution can result in these incomplete linger requests never
> getting re-sent as desired. The problem lies in the fact that
> the second loop only arranges for a linger request to be sent
> if it appears its target osd has changed. This is the proper
> handling for *completed* linger requests (it avoids issuing
> the same linger request twice to the same osd).
>
> But although the linger requests added to the list in the first loop
> may have been sent, they have not yet completed, so they need to be
> re-sent regardless of whether their target osd has changed.
>
> The first required fix is we need to avoid calling __map_request()
> on any incomplete linger request. Otherwise the subsequent
> __map_request() call in the second loop will find the target osd
> has not changed and will therefore not re-send the request.
>
> Second, we need to be sure that a sent but incomplete linger request
> gets re-sent. If the target osd is the same with the new osd map as
> it was when the request was originally sent, this won't happen.
> This can be fixed through careful handling when we move these
> requests from the request list to the linger list, by unregistering
> the request *before* it is registered as a linger request. This
> works because a side-effect of unregistering the request is to make
> the request's r_osd pointer be NULL, and *that* will ensure the
> second loop actually re-sends the linger request.
>
> Processing of such a request is done at that point, so continue with
> the next one once it's been moved.
>
> Signed-off-by: Alex Elder <elder@inktank.com>
> ---
> net/ceph/osd_client.c | 31 ++++++++++++++++++++-----------
> 1 file changed, 20 insertions(+), 11 deletions(-)
>
> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> index 780caf6..616c6cf 100644
> --- a/net/ceph/osd_client.c
> +++ b/net/ceph/osd_client.c
> @@ -1284,6 +1284,25 @@ static void kick_requests(struct ceph_osd_client
> *osdc, int force_resend)
> for (p = rb_first(&osdc->requests); p; ) {
> req = rb_entry(p, struct ceph_osd_request, r_node);
> p = rb_next(p);
> +
> + /*
> + * For linger requests that have not yet been
> + * registered, move them to the linger list; they'll
> + * be sent to the osd in the loop below. Unregister
> + * the request before re-registering it as a linger
> + * request to ensure the __map_request() below
> + * will decide it needs to be sent.
> + */
> + if (req->r_linger && list_empty(&req->r_linger_item)) {
> + dout("%p tid %llu restart on osd%d\n",
> + req, req->r_tid,
> + req->r_osd ? req->r_osd->o_osd : -1);
> + __unregister_request(osdc, req);
> + __register_linger_request(osdc, req);
> +
Drop the newline?
Reviewed-by: Sage Weil <sage@inktank.com>
> + continue;
> + }
> +
> err = __map_request(osdc, req, force_resend);
> if (err < 0)
> continue; /* error */
> @@ -1298,17 +1317,6 @@ static void kick_requests(struct ceph_osd_client
> *osdc, int force_resend)
> req->r_flags |= CEPH_OSD_FLAG_RETRY;
> }
> }
> - if (req->r_linger && list_empty(&req->r_linger_item)) {
> - /*
> - * register as a linger so that we will
> - * re-submit below and get a new tid
> - */
> - dout("%p tid %llu restart on osd%d\n",
> - req, req->r_tid,
> - req->r_osd ? req->r_osd->o_osd : -1);
> - __register_linger_request(osdc, req);
> - __unregister_request(osdc, req);
> - }
> }
>
> list_for_each_entry_safe(req, nreq, &osdc->req_linger,
> @@ -1316,6 +1324,7 @@ static void kick_requests(struct ceph_osd_client
> *osdc, int force_resend)
> dout("linger req=%p req->r_osd=%p\n", req, req->r_osd);
>
> err = __map_request(osdc, req, force_resend);
> + dout("__map_request returned %d\n", err);
> if (err == 0)
> continue; /* no change and no osd was specified */
> if (err < 0)
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply
* Re: [PATCH v3 00/11] xen: Initial kexec/kdump implementation
From: Daniel Kiper @ 2012-12-28 0:53 UTC (permalink / raw)
To: ebiederm
Cc: kexec, xen-devel, konrad.wilk, tglx, maxim.uvarov, andrew.cooper3,
hpa, jbeulich, mingo, x86, virtualization, vgoyal, linux-kernel
> Andrew Cooper <andrew.cooper3@citrix.com> writes:
>
> > On 27/12/2012 07:53, Eric W. Biederman wrote:
> >> The syscall ABI still has the wrong semantics.
> >>
> >> Aka totally unmaintainable and umergeable.
> >>
> >> The concept of domU support is also strange. What does domU support even mean, when the dom0 > support is loading a kernel to pick up Xen when Xen falls over.
> >
> > There are two requirements pulling at this patch series, but I agree
> > that we need to clarify them.
>
> It probably make sense to split them apart a little even.
>
> > When dom0 loads a crash kernel, it is loading one for Xen to use. As a
> > dom0 crash causes a Xen crash, having dom0 set up a kdump kernel for
> > itself is completely useless. This ability is present in "classic Xen
> > dom0" kernels, but the feature is currently missing in PVOPS.
>
> > Many cloud customers and service providers want the ability for a VM
> > administrator to be able to load a kdump/kexec kernel within a
> > domain[1]. This allows the VM administrator to take more proactive
> > steps to isolate the cause of a crash, the state of which is most likely
> > discarded while tearing down the domain. The result being that as far
> > as Xen is concerned, the domain is still alive, while the kdump
> > kernel/environment can work its usual magic. I am not aware of any
> > feature like this existing in the past.
>
> Which makes domU support semantically just the normal kexec/kdump
> support. Got it.
To some extent. It is true on HVM and PVonHVM guests. However,
PV guests requires a bit different kexec/kdump implementation
than plain kexec/kdump. Proposed firmware support has almost
all required features. PV guest specific features (a few) will
be added later (after agreeing generic firmware support which
is sufficient at least for dom0).
It looks that I should replace domU by PV guest in patch description.
> The point of implementing domU is for those times when the hypervisor
> admin and the kernel admin are different.
Right.
> For domU support modifying or adding alternate versions of
> machine_kexec.c and relocate_kernel.S to add paravirtualization support
> make sense.
It is not sufficient. Please look above.
> There is the practical argument that for implementation efficiency of
> crash dumps it would be better if that support came from the hypervisor
> or the hypervisor environment. But this gets into the practical reality
I am thinking about that.
> that the hypervisor environment does not do that today. Furthermore
> kexec all by itself working in a paravirtualized environment under Xen
> makes sense.
>
> domU support is what Peter was worrying about for cleanliness, and
> we need some x86 backend ops there, and generally to be careful.
As I know we do not need any additional pv_ops stuff
if we place all needed things in kexec firmware support.
Daniel
^ permalink raw reply
* Re: [PATCH v3 00/11] xen: Initial kexec/kdump implementation
From: Daniel Kiper @ 2012-12-28 0:53 UTC (permalink / raw)
To: ebiederm
Cc: xen-devel, linux-kernel, konrad.wilk, andrew.cooper3, hpa, kexec,
x86, virtualization, mingo, jbeulich, maxim.uvarov, tglx, vgoyal
> Andrew Cooper <andrew.cooper3@citrix.com> writes:
>
> > On 27/12/2012 07:53, Eric W. Biederman wrote:
> >> The syscall ABI still has the wrong semantics.
> >>
> >> Aka totally unmaintainable and umergeable.
> >>
> >> The concept of domU support is also strange. What does domU support even mean, when the dom0 > support is loading a kernel to pick up Xen when Xen falls over.
> >
> > There are two requirements pulling at this patch series, but I agree
> > that we need to clarify them.
>
> It probably make sense to split them apart a little even.
>
> > When dom0 loads a crash kernel, it is loading one for Xen to use. As a
> > dom0 crash causes a Xen crash, having dom0 set up a kdump kernel for
> > itself is completely useless. This ability is present in "classic Xen
> > dom0" kernels, but the feature is currently missing in PVOPS.
>
> > Many cloud customers and service providers want the ability for a VM
> > administrator to be able to load a kdump/kexec kernel within a
> > domain[1]. This allows the VM administrator to take more proactive
> > steps to isolate the cause of a crash, the state of which is most likely
> > discarded while tearing down the domain. The result being that as far
> > as Xen is concerned, the domain is still alive, while the kdump
> > kernel/environment can work its usual magic. I am not aware of any
> > feature like this existing in the past.
>
> Which makes domU support semantically just the normal kexec/kdump
> support. Got it.
To some extent. It is true on HVM and PVonHVM guests. However,
PV guests requires a bit different kexec/kdump implementation
than plain kexec/kdump. Proposed firmware support has almost
all required features. PV guest specific features (a few) will
be added later (after agreeing generic firmware support which
is sufficient at least for dom0).
It looks that I should replace domU by PV guest in patch description.
> The point of implementing domU is for those times when the hypervisor
> admin and the kernel admin are different.
Right.
> For domU support modifying or adding alternate versions of
> machine_kexec.c and relocate_kernel.S to add paravirtualization support
> make sense.
It is not sufficient. Please look above.
> There is the practical argument that for implementation efficiency of
> crash dumps it would be better if that support came from the hypervisor
> or the hypervisor environment. But this gets into the practical reality
I am thinking about that.
> that the hypervisor environment does not do that today. Furthermore
> kexec all by itself working in a paravirtualized environment under Xen
> makes sense.
>
> domU support is what Peter was worrying about for cleanliness, and
> we need some x86 backend ops there, and generally to be careful.
As I know we do not need any additional pv_ops stuff
if we place all needed things in kexec firmware support.
Daniel
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply
* Re: [PATCH v3 00/11] xen: Initial kexec/kdump implementation
From: Daniel Kiper @ 2012-12-28 0:53 UTC (permalink / raw)
To: ebiederm
Cc: xen-devel, linux-kernel, konrad.wilk, andrew.cooper3, hpa, kexec,
x86, virtualization, mingo, jbeulich, maxim.uvarov, tglx, vgoyal
> Andrew Cooper <andrew.cooper3@citrix.com> writes:
>
> > On 27/12/2012 07:53, Eric W. Biederman wrote:
> >> The syscall ABI still has the wrong semantics.
> >>
> >> Aka totally unmaintainable and umergeable.
> >>
> >> The concept of domU support is also strange. What does domU support even mean, when the dom0 > support is loading a kernel to pick up Xen when Xen falls over.
> >
> > There are two requirements pulling at this patch series, but I agree
> > that we need to clarify them.
>
> It probably make sense to split them apart a little even.
>
> > When dom0 loads a crash kernel, it is loading one for Xen to use. As a
> > dom0 crash causes a Xen crash, having dom0 set up a kdump kernel for
> > itself is completely useless. This ability is present in "classic Xen
> > dom0" kernels, but the feature is currently missing in PVOPS.
>
> > Many cloud customers and service providers want the ability for a VM
> > administrator to be able to load a kdump/kexec kernel within a
> > domain[1]. This allows the VM administrator to take more proactive
> > steps to isolate the cause of a crash, the state of which is most likely
> > discarded while tearing down the domain. The result being that as far
> > as Xen is concerned, the domain is still alive, while the kdump
> > kernel/environment can work its usual magic. I am not aware of any
> > feature like this existing in the past.
>
> Which makes domU support semantically just the normal kexec/kdump
> support. Got it.
To some extent. It is true on HVM and PVonHVM guests. However,
PV guests requires a bit different kexec/kdump implementation
than plain kexec/kdump. Proposed firmware support has almost
all required features. PV guest specific features (a few) will
be added later (after agreeing generic firmware support which
is sufficient at least for dom0).
It looks that I should replace domU by PV guest in patch description.
> The point of implementing domU is for those times when the hypervisor
> admin and the kernel admin are different.
Right.
> For domU support modifying or adding alternate versions of
> machine_kexec.c and relocate_kernel.S to add paravirtualization support
> make sense.
It is not sufficient. Please look above.
> There is the practical argument that for implementation efficiency of
> crash dumps it would be better if that support came from the hypervisor
> or the hypervisor environment. But this gets into the practical reality
I am thinking about that.
> that the hypervisor environment does not do that today. Furthermore
> kexec all by itself working in a paravirtualized environment under Xen
> makes sense.
>
> domU support is what Peter was worrying about for cleanliness, and
> we need some x86 backend ops there, and generally to be careful.
As I know we do not need any additional pv_ops stuff
if we place all needed things in kexec firmware support.
Daniel
^ permalink raw reply
* Re: [PATCH V3 5/8] memcg: add per cgroup writeback pages accounting
From: Kamezawa Hiroyuki @ 2012-12-28 0:52 UTC (permalink / raw)
To: Sha Zhengju
Cc: linux-kernel, cgroups, linux-mm, mhocko, akpm, gthelen,
fengguang.wu, glommer, Sha Zhengju
In-Reply-To: <1356456409-14701-1-git-send-email-handai.szj@taobao.com>
(2012/12/26 2:26), Sha Zhengju wrote:
> From: Sha Zhengju <handai.szj@taobao.com>
>
> Similar to dirty page, we add per cgroup writeback pages accounting. The lock
> rule still is:
> mem_cgroup_begin_update_page_stat()
> modify page WRITEBACK stat
> mem_cgroup_update_page_stat()
> mem_cgroup_end_update_page_stat()
>
> There're two writeback interface to modify: test_clear/set_page_writeback.
>
> Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
^ permalink raw reply
* Re: [PATCH V3 5/8] memcg: add per cgroup writeback pages accounting
From: Kamezawa Hiroyuki @ 2012-12-28 0:52 UTC (permalink / raw)
To: Sha Zhengju
Cc: linux-kernel, cgroups, linux-mm, mhocko, akpm, gthelen,
fengguang.wu, glommer, Sha Zhengju
In-Reply-To: <1356456409-14701-1-git-send-email-handai.szj@taobao.com>
(2012/12/26 2:26), Sha Zhengju wrote:
> From: Sha Zhengju <handai.szj@taobao.com>
>
> Similar to dirty page, we add per cgroup writeback pages accounting. The lock
> rule still is:
> mem_cgroup_begin_update_page_stat()
> modify page WRITEBACK stat
> mem_cgroup_update_page_stat()
> mem_cgroup_end_update_page_stat()
>
> There're two writeback interface to modify: test_clear/set_page_writeback.
>
> Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH] f2fs: add missing pretech.h include
From: Jaegeuk Kim @ 2012-12-28 0:48 UTC (permalink / raw)
To: Heiko Carstens; +Cc: Namjae Jeon, linux-kernel, linux-fsdevel
In-Reply-To: <20121227143424.GB3223@osiris>
[-- Attachment #1: Type: text/plain, Size: 615 bytes --]
2012-12-27 (목), 15:34 +0100, Heiko Carstens:
> On Thu, Dec 27, 2012 at 03:30:32PM +0100, Heiko Carstens wrote:
> > From aa027f06dfb5b2fd27d6f92391d8340df671e82b Mon Sep 17 00:00:00 2001
> > From: Heiko Carstens <heiko.carstens@de.ibm.com>
> > Date: Thu, 27 Dec 2012 15:22:27 +0100
> > Subject: [PATCH] f2fs: add missing pretech.h include
>
> That should have been "prefetch.h", obviously...
> At least the patch is correct.
Hi,
This was fixed by Geert Uytterhoeven before.
I'm supposed to push other bug fixes with this to linus today.
Anyway, thank you very much. :)
--
Jaegeuk Kim
Samsung
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply
* [ANNOUNCE] Git v1.8.0.3
From: Junio C Hamano @ 2012-12-28 0:48 UTC (permalink / raw)
To: git; +Cc: Linux Kernel
The latest maintenance release Git v1.8.0.3 is now available at
the usual places.
This is primarily to down-merge documentation updates that have been
accumulating to the 'master' front for the upcoming 1.8.1 to the
maintenance series.
The release tarballs are found at:
http://code.google.com/p/git-core/downloads/list
and their SHA-1 checksums are:
b1695f28448c00e61e110b3c7bcd66c8047ef176 git-1.8.0.3.tar.gz
83c46b62e0c3979c5ef77a407ca41507658b5726 git-htmldocs-1.8.0.3.tar.gz
63df55f90b9c6c11dd827ce1880b5b5fdf79c1c1 git-manpages-1.8.0.3.tar.gz
Also the following public repositories all have a copy of the v1.8.0.3
tag and the maint branch that the tag points at:
url = git://repo.or.cz/alt-git.git
url = https://code.google.com/p/git-core/
url = git://git.sourceforge.jp/gitroot/git-core/git.git
url = git://git-core.git.sourceforge.net/gitroot/git-core/git-core
url = https://github.com/gitster/git
Git v1.8.0.3 Release Notes
==========================
Fixes since v1.8.0.2
--------------------
* "git log -p -S<string>" did not apply the textconv filter while
looking for the <string>.
* In the documentation, some invalid example e-mail addresses were
formatted into mailto: links.
Also contains many documentation updates backported from the 'master'
branch that is preparing for the upcoming 1.8.1 release.
----------------------------------------------------------------
Changes since v1.8.0.2 are as follows:
Adam Spiers (2):
SubmittingPatches: add convention of prefixing commit messages
Documentation: move support for old compilers to CodingGuidelines
Anders Kaseorg (1):
git-prompt: Document GIT_PS1_DESCRIBE_STYLE
Chris Rorvick (2):
Documentation/git-checkout.txt: clarify usage
Documentation/git-checkout.txt: document 70c9ac2 behavior
Gunnlaugur Þór Briem (1):
Document git-svn fetch --log-window-size parameter
Jeff King (7):
pickaxe: hoist empty needle check
pickaxe: use textconv for -S counting
.mailmap: match up some obvious names/emails
.mailmap: fix broken entry for Martin Langhoff
.mailmap: normalize emails for Jeff King
.mailmap: normalize emails for Linus Torvalds
contrib: update stats/mailmap script
John Keeping (1):
Documentation: don't link to example mail addresses
Junio C Hamano (6):
fetch --tags: clarify documentation
README: it does not matter who the current maintainer is
t7004: do not create unneeded gpghome/gpg.conf when GPG is not used
Documentation: Describe "git diff <blob> <blob>" separately
git(1): show link to contributor summary page
Git 1.8.0.3
Krzysztof Mazur (1):
doc: git-reset: make "<mode>" optional
Manlio Perillo (1):
git.txt: add missing info about --git-dir command-line option
Matthew Daley (1):
Fix sizeof usage in get_permutations
Max Horn (1):
git-remote-helpers.txt: document invocation before input format
Nguyễn Thái Ngọc Duy (1):
index-format.txt: clarify what is "invalid"
Ramkumar Ramachandra (1):
Documentation: move diff.wordRegex from config.txt to diff-config.txt
Sebastian Leske (4):
git-svn: Document branches with at-sign(@).
git-svn: Recommend use of structure options.
git-svn: Expand documentation for --follow-parent
git-svn: Note about tags.
Sitaram Chamarty (1):
clarify -M without % symbol in diff-options
Stefano Lattarini (1):
README: Git is released under the GPLv2, not just "the GPL"
Thomas Ackermann (8):
Split over-long synopsis in git-fetch-pack.txt into several lines
Shorten two over-long lines in git-bisect-lk2009.txt by abbreviating some sha1
Change headline of technical/send-pack-pipeline.txt to not confuse its content with content from git-send-pack.txt
Documentation/technical: convert plain text files to asciidoc
Documentation/howto: convert plain text files to asciidoc
Documentation: build html for all files in technical and howto
Remove misleading date from api-index-skel.txt
Sort howto documents in howto-index.txt
Tom Jones (1):
Add -S, --gpg-sign option to manpage of "git commit"
^ permalink raw reply
* Re: vxlan in Linux kernel 3.7
From: Naoto MATSUMOTO @ 2012-12-28 0:46 UTC (permalink / raw)
To: Qin, Xiaohong; +Cc: netdev@vger.kernel.org
In-Reply-To: <A3CA455BB4F1DA4E92CB43AAF0E4BB1D0667E12B@MX01A.corp.emc.com>
Hi Qin
FYI(For Your Information)
A First Look At VXLAN over Infiniband Network On Linux 3.7-rc7 & iproute2
http://slidesha.re/TsCKWc
On Thu, 27 Dec 2012 13:42:51 -0500
"Qin, Xiaohong" <Xiaohong.Qin@emc.com> wrote:
> Hi All,
>
> I have installed kernel 3.7 on my Linux box, see the following uname -a output,
>
> uname -a
> Linux c210-m2-sib-3 3.7.0-030700-generic #201212102335 SMP Tue Dec 11 04:36:24 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
>
> Does that mean I've got VXLAN module loaded or I still need to go through some extra steps to enable or configure it? Do you have any VXLAN setup or configuration document by chance?
>
> Thanks.
>
> Dennis Qin
>
> P.S. If this is not the right place to ask this kind of questions, please let me know which mailing list I should use.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
SAKURA Internet Inc. / Senior Researcher
Naoto MATSUMOTO <n-matsumoto@sakura.ad.jp>
SAKURA Research Center <http://research.sakura.ad.jp/>
^ permalink raw reply
* Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000500
From: Zlatko Calusic @ 2012-12-28 0:42 UTC (permalink / raw)
To: sedat.dilek; +Cc: LKML, linux-mm
In-Reply-To: <CA+icZUWDPux3QAzQ85ntfUTHMr5m3Ueo-zjfnoGGxi=VrB9_7A@mail.gmail.com>
On 28.12.2012 01:37, Sedat Dilek wrote:
> On Fri, Dec 28, 2012 at 1:33 AM, Zlatko Calusic <zlatko.calusic@iskon.hr> wrote:
>> On 28.12.2012 01:24, Sedat Dilek wrote:
>>>
>>> On Fri, Dec 28, 2012 at 12:51 AM, Zlatko Calusic
>>> <zlatko.calusic@iskon.hr> wrote:
>>>>
>>>> On 28.12.2012 00:42, Sedat Dilek wrote:
>>>>>
>>>>>
>>>>> On Fri, Dec 28, 2012 at 12:39 AM, Zlatko Calusic
>>>>> <zlatko.calusic@iskon.hr> wrote:
>>>>>>
>>>>>>
>>>>>> On 28.12.2012 00:30, Sedat Dilek wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi Zlatko,
>>>>>>>
>>>>>>> I am not sure if I hit the same problem as described in this thread.
>>>>>>>
>>>>>>> Under heavy load, while building a customized toolchain for the Freetz
>>>>>>> router project I got a BUG || NULL pointer derefence || kswapd ||
>>>>>>> zone_balanced || pgdat_balanced() etc. (details see my screenshot).
>>>>>>>
>>>>>>> I will try your patch from [1] ***only*** on top of my last
>>>>>>> Linux-v3.8-rc1 GIT setup (post-v3.8-rc1 mainline + some net-fixes).
>>>>>>>
>>>>>>
>>>>>> Yes, that's the same bug. It should be fixed with my latest patch, so
>>>>>> I'd
>>>>>> appreciate you testing it, to be on the safe side this time. There
>>>>>> should
>>>>>> be
>>>>>> no difference if you apply it to anything newer than 3.8-rc1, so go for
>>>>>> it.
>>>>>> Thanks!
>>>>>>
>>>>>
>>>>> Not sure how I can really reproduce this bug as one build worked fine
>>>>> within my last v3.8-rc1 kernel.
>>>>> I increased the parallel-make-jobs-number from "4" to "8" to stress a
>>>>> bit harder.
>>>>> Just building right now... and will report.
>>>>>
>>>>> If you have any test-case (script or whatever), please let me/us know.
>>>>>
>>>>
>>>> Unfortunately not, I haven't reproduced it yet on my machines. But it
>>>> seems
>>>> that bug will hit only under heavy memory pressure. When close to OOM, or
>>>> possibly with lots of writing to disk. It's also possible that
>>>> fragmentation
>>>> of memory zones could provoke it, that means testing it for a longer
>>>> time.
>>>>
>>>
>>> I tested successfully by doing simultaneously...
>>> - building Freetz with 8 parallel make-jobs
>>> - building Linux GIT with 1 make-job
>>> - 9 tabs open in firefox
>>> - In one tab I ran YouTube music video
>>> - etc.
>>>
>>> I am reading [1] and [2] where another user reports success by reverting
>>> this...
>>>
>>> commit cda73a10eb3f493871ed39f468db50a65ebeddce
>>> "mm: do not sleep in balance_pgdat if there's no i/o congestion"
>>>
>>> BTW, this machine has also 4GiB RAM (Ubuntu/precise AMD64).
>>>
>>> Feel free to add a "Reported-by/Tested-by" if you think this is a
>>> positive report.
>>>
>>
>> Thanks for the testing! And keep running it in case something interesting
>> pops up. ;)
>>
>> No need to revert cda73a10eb because it fixes another bug. And the patch
>> you're now running fixes the new bug I introduced with a combination of my
>> latest 2 patches. Nah, it gets complicated... :)
>>
>> But, at least I found the culprit and as soon as Linus applies the fix,
>> everything will be hunky dory again, at least on this front. :P
>>
>
> I am not subscribed to LKML and linux-mm,,,
> Do you have a patch with a proper subject and descriptive text? URL?
>
Soon to follow. I'd appreciate Zhouping Liu testing it too, though.
--
Zlatko
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: TUN problems (regression?)
From: Stephen Hemminger @ 2012-12-28 0:41 UTC (permalink / raw)
To: Jason Wang; +Cc: Eric Dumazet, Paul Moore, netdev
In-Reply-To: <50D3E510.6020008@redhat.com>
On Fri, 21 Dec 2012 12:26:56 +0800
Jason Wang <jasowang@redhat.com> wrote:
> On 12/21/2012 11:39 AM, Eric Dumazet wrote:
> > On Fri, 2012-12-21 at 11:32 +0800, Jason Wang wrote:
> >> On 12/21/2012 07:50 AM, Stephen Hemminger wrote:
> >>> On Thu, 20 Dec 2012 15:38:17 -0800
> >>> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >>>
> >>>> On Thu, 2012-12-20 at 18:16 -0500, Paul Moore wrote:
> >>>>> [CC'ing netdev in case this is a known problem I just missed ...]
> >>>>>
> >>>>> Hi Jason,
> >>>>>
> >>>>> I started doing some more testing with the multiqueue TUN changes and I ran
> >>>>> into a problem when running tunctl: running it once w/o arguments works as
> >>>>> expected, but running it a second time results in failure and a
> >>>>> kmem_cache_sanity_check() failure. The problem appears to be very repeatable
> >>>>> on my test VM and happens independent of the LSM/SELinux fixup patches.
> >>>>>
> >>>>> Have you seen this before?
> >>>>>
> >>>> Obviously code in tun_flow_init() is wrong...
> >>>>
> >>>> static int tun_flow_init(struct tun_struct *tun)
> >>>> {
> >>>> int i;
> >>>>
> >>>> tun->flow_cache = kmem_cache_create("tun_flow_cache",
> >>>> sizeof(struct tun_flow_entry), 0, 0,
> >>>> NULL);
> >>>> if (!tun->flow_cache)
> >>>> return -ENOMEM;
> >>>> ...
> >>>> }
> >>>>
> >>>>
> >>>> I have no idea why we would need a kmem_cache per tun_struct,
> >>>> and why we even need a kmem_cache.
> >>> Normally flow malloc/free should be good enough.
> >>> It might make sense to use private kmem_cache if doing hlist_nulls.
> >>>
> >>>
> >>> Acked-by: Stephen Hemminger <shemminger@vyatta.com>
> >> Should be at least a global cache, I thought I can get some speed-up by
> >> using kmem_cache.
> >>
> >> Acked-by: Jason Wang <jasowang@redhat.com>
> > Was it with SLUB or SLAB ?
> >
> > Using generic kmalloc-64 is better than a dedicated kmem_cache of 48
> > bytes per object, as we guarantee each object is on a single cache line.
> >
> >
>
> Right, thanks for the explanation.
>
I wonder if TUN would be better if it used a array to translate
receive hash to receive queue. This is how real hardware works with the
indirection table, and it would allow RFS acceleration. The current flow
cache stuff is prone to DoS attack and scaling problems with lots of
short lived flows.
^ permalink raw reply
* Re: [PATCH V3 3/8] use vfs __set_page_dirty interface instead of doing it inside filesystem
From: Kamezawa Hiroyuki @ 2012-12-28 0:41 UTC (permalink / raw)
To: Sha Zhengju
Cc: linux-kernel, cgroups, linux-mm, linux-fsdevel, ceph-devel, sage,
dchinner, mhocko, akpm, gthelen, fengguang.wu, glommer,
Sha Zhengju
In-Reply-To: <1356456261-14579-1-git-send-email-handai.szj@taobao.com>
(2012/12/26 2:24), Sha Zhengju wrote:
> From: Sha Zhengju <handai.szj@taobao.com>
>
> Following we will treat SetPageDirty and dirty page accounting as an integrated
> operation. Filesystems had better use vfs interface directly to avoid those details.
>
> Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
> Acked-by: Sage Weil <sage@inktank.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
^ permalink raw reply
* Re: [PATCH V3 3/8] use vfs __set_page_dirty interface instead of doing it inside filesystem
From: Kamezawa Hiroyuki @ 2012-12-28 0:41 UTC (permalink / raw)
To: Sha Zhengju
Cc: linux-kernel, cgroups, linux-mm, linux-fsdevel, ceph-devel, sage,
dchinner, mhocko, akpm, gthelen, fengguang.wu, glommer,
Sha Zhengju
In-Reply-To: <1356456261-14579-1-git-send-email-handai.szj@taobao.com>
(2012/12/26 2:24), Sha Zhengju wrote:
> From: Sha Zhengju <handai.szj@taobao.com>
>
> Following we will treat SetPageDirty and dirty page accounting as an integrated
> operation. Filesystems had better use vfs interface directly to avoid those details.
>
> Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
> Acked-by: Sage Weil <sage@inktank.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH V3 3/8] use vfs __set_page_dirty interface instead of doing it inside filesystem
From: Kamezawa Hiroyuki @ 2012-12-28 0:41 UTC (permalink / raw)
To: Sha Zhengju
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
ceph-devel-u79uwXL29TY76Z2rM5mHXA, sage-BnTBU8nroG7k1uMJSBkQmQ,
dchinner-H+wXaHxf7aLQT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
gthelen-hpIqsD4AKlfQT0dZR+AlfA,
fengguang.wu-ral2JQCrhuEAvxtiuMwx3w,
glommer-bzQdu9zFT3WakBO8gow8eQ, Sha Zhengju
In-Reply-To: <1356456261-14579-1-git-send-email-handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org>
(2012/12/26 2:24), Sha Zhengju wrote:
> From: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org>
>
> Following we will treat SetPageDirty and dirty page accounting as an integrated
> operation. Filesystems had better use vfs interface directly to avoid those details.
>
> Signed-off-by: Sha Zhengju <handai.szj-3b8fjiQLQpfQT0dZR+AlfA@public.gmane.org>
> Acked-by: Sage Weil <sage-4GqslpFJ+cxBDgjK7y7TUQ@public.gmane.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
^ permalink raw reply
* Re: [Xenomai] Xenomai 2.6.2 + Kernel 3.5.3 + AMD Phenom II X6 1055
From: Gilles Chanteperdrix @ 2012-12-28 0:41 UTC (permalink / raw)
To: Mariusz Janiak; +Cc: Xenomai
In-Reply-To: <50dcddf63f5d48.40617259@wp.pl>
On 12/28/2012 12:47 AM, Mariusz Janiak wrote:
> Hi
>
> As soon as you publish new Xenomai 2.6.2 release, I have tried to launch it on my desktop, and unfortunately I went into trouble. My testbed:
>
> - Xenomai 2.6.2,
>
> - Kernel 3.5.3 (I haven't testes earlier versions),
>
> - AMD Phenom II X6 1055 + Gigabyte GA-890GPA-UD3H
>
> - Ubuntu 12.10.
>
> The problem is following (from dmesg):
The problem real problem is here (from your dmesg):
> System has AMD C1E enabled
> Switch to broadcast mode on CPU1
> Switch to broadcast mode on CPU2
> Switch to broadcast mode on CPU3
> Switch to broadcast mode on CPU4
> I went through installation manual, troubleshooting guide and forum
> archives, maybe I have missed something. What is more I have tested this
> kernel on my laptop HP pavilion dm1 with AMD E-350 and it work
> without problems.
We have a message saying:
I-pipe: cannot use LAPIC as a tick device
I-pipe: disable C1E power state in your BIOS
For some reason, however, it does not get printed in your case. I agree
that we should document this in the TROUBLESHOOTING guide.
--
Gilles.
^ permalink raw reply
* Re: [PATCH V3 2/8] Make TestSetPageDirty and dirty page accounting in one func
From: Kamezawa Hiroyuki @ 2012-12-28 0:39 UTC (permalink / raw)
To: Sha Zhengju
Cc: linux-kernel, cgroups, linux-mm, linux-fsdevel, dchinner, mhocko,
akpm, gthelen, fengguang.wu, glommer, Sha Zhengju
In-Reply-To: <1356456156-14535-1-git-send-email-handai.szj@taobao.com>
(2012/12/26 2:22), Sha Zhengju wrote:
> From: Sha Zhengju <handai.szj@taobao.com>
>
> Commit a8e7d49a(Fix race in create_empty_buffers() vs __set_page_dirty_buffers())
> extracts TestSetPageDirty from __set_page_dirty and is far away from
> account_page_dirtied. But it's better to make the two operations in one single
> function to keep modular. So in order to avoid the potential race mentioned in
> commit a8e7d49a, we can hold private_lock until __set_page_dirty completes.
> There's no deadlock between ->private_lock and ->tree_lock after confirmation.
> It's a prepare patch for following memcg dirty page accounting patches.
>
>
> Here is some test numbers that before/after this patch:
> Test steps(Mem-4g, ext4):
> drop_cache; sync
> fio (ioengine=sync/write/buffered/bs=4k/size=1g/numjobs=2/group_reporting/thread)
>
> We test it for 10 times and get the average numbers:
> Before:
> write: io=2048.0MB, bw=254117KB/s, iops=63528.9 , runt= 8279msec
> lat (usec): min=1 , max=742361 , avg=30.918, stdev=1601.02
> After:
> write: io=2048.0MB, bw=254044KB/s, iops=63510.3 , runt= 8274.4msec
> lat (usec): min=1 , max=856333 , avg=31.043, stdev=1769.32
>
> Note that the impact is little(<1%).
>
>
> Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
> Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Hmm,..this change should be double-checked by vfs, I/O guys...
increasing hold time of mapping->private_lock doesn't affect performance ?
^ permalink raw reply
* Re: [PATCH V3 2/8] Make TestSetPageDirty and dirty page accounting in one func
From: Kamezawa Hiroyuki @ 2012-12-28 0:39 UTC (permalink / raw)
To: Sha Zhengju
Cc: linux-kernel, cgroups, linux-mm, linux-fsdevel, dchinner, mhocko,
akpm, gthelen, fengguang.wu, glommer, Sha Zhengju
In-Reply-To: <1356456156-14535-1-git-send-email-handai.szj@taobao.com>
(2012/12/26 2:22), Sha Zhengju wrote:
> From: Sha Zhengju <handai.szj@taobao.com>
>
> Commit a8e7d49a(Fix race in create_empty_buffers() vs __set_page_dirty_buffers())
> extracts TestSetPageDirty from __set_page_dirty and is far away from
> account_page_dirtied. But it's better to make the two operations in one single
> function to keep modular. So in order to avoid the potential race mentioned in
> commit a8e7d49a, we can hold private_lock until __set_page_dirty completes.
> There's no deadlock between ->private_lock and ->tree_lock after confirmation.
> It's a prepare patch for following memcg dirty page accounting patches.
>
>
> Here is some test numbers that before/after this patch:
> Test steps(Mem-4g, ext4):
> drop_cache; sync
> fio (ioengine=sync/write/buffered/bs=4k/size=1g/numjobs=2/group_reporting/thread)
>
> We test it for 10 times and get the average numbers:
> Before:
> write: io=2048.0MB, bw=254117KB/s, iops=63528.9 , runt= 8279msec
> lat (usec): min=1 , max=742361 , avg=30.918, stdev=1601.02
> After:
> write: io=2048.0MB, bw=254044KB/s, iops=63510.3 , runt= 8274.4msec
> lat (usec): min=1 , max=856333 , avg=31.043, stdev=1769.32
>
> Note that the impact is little(<1%).
>
>
> Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
> Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Hmm,..this change should be double-checked by vfs, I/O guys...
increasing hold time of mapping->private_lock doesn't affect performance ?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000500
From: Sedat Dilek @ 2012-12-28 0:37 UTC (permalink / raw)
To: Zlatko Calusic; +Cc: LKML, linux-mm
In-Reply-To: <50DCE8C8.8050103@iskon.hr>
On Fri, Dec 28, 2012 at 1:33 AM, Zlatko Calusic <zlatko.calusic@iskon.hr> wrote:
> On 28.12.2012 01:24, Sedat Dilek wrote:
>>
>> On Fri, Dec 28, 2012 at 12:51 AM, Zlatko Calusic
>> <zlatko.calusic@iskon.hr> wrote:
>>>
>>> On 28.12.2012 00:42, Sedat Dilek wrote:
>>>>
>>>>
>>>> On Fri, Dec 28, 2012 at 12:39 AM, Zlatko Calusic
>>>> <zlatko.calusic@iskon.hr> wrote:
>>>>>
>>>>>
>>>>> On 28.12.2012 00:30, Sedat Dilek wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi Zlatko,
>>>>>>
>>>>>> I am not sure if I hit the same problem as described in this thread.
>>>>>>
>>>>>> Under heavy load, while building a customized toolchain for the Freetz
>>>>>> router project I got a BUG || NULL pointer derefence || kswapd ||
>>>>>> zone_balanced || pgdat_balanced() etc. (details see my screenshot).
>>>>>>
>>>>>> I will try your patch from [1] ***only*** on top of my last
>>>>>> Linux-v3.8-rc1 GIT setup (post-v3.8-rc1 mainline + some net-fixes).
>>>>>>
>>>>>
>>>>> Yes, that's the same bug. It should be fixed with my latest patch, so
>>>>> I'd
>>>>> appreciate you testing it, to be on the safe side this time. There
>>>>> should
>>>>> be
>>>>> no difference if you apply it to anything newer than 3.8-rc1, so go for
>>>>> it.
>>>>> Thanks!
>>>>>
>>>>
>>>> Not sure how I can really reproduce this bug as one build worked fine
>>>> within my last v3.8-rc1 kernel.
>>>> I increased the parallel-make-jobs-number from "4" to "8" to stress a
>>>> bit harder.
>>>> Just building right now... and will report.
>>>>
>>>> If you have any test-case (script or whatever), please let me/us know.
>>>>
>>>
>>> Unfortunately not, I haven't reproduced it yet on my machines. But it
>>> seems
>>> that bug will hit only under heavy memory pressure. When close to OOM, or
>>> possibly with lots of writing to disk. It's also possible that
>>> fragmentation
>>> of memory zones could provoke it, that means testing it for a longer
>>> time.
>>>
>>
>> I tested successfully by doing simultaneously...
>> - building Freetz with 8 parallel make-jobs
>> - building Linux GIT with 1 make-job
>> - 9 tabs open in firefox
>> - In one tab I ran YouTube music video
>> - etc.
>>
>> I am reading [1] and [2] where another user reports success by reverting
>> this...
>>
>> commit cda73a10eb3f493871ed39f468db50a65ebeddce
>> "mm: do not sleep in balance_pgdat if there's no i/o congestion"
>>
>> BTW, this machine has also 4GiB RAM (Ubuntu/precise AMD64).
>>
>> Feel free to add a "Reported-by/Tested-by" if you think this is a
>> positive report.
>>
>
> Thanks for the testing! And keep running it in case something interesting
> pops up. ;)
>
> No need to revert cda73a10eb because it fixes another bug. And the patch
> you're now running fixes the new bug I introduced with a combination of my
> latest 2 patches. Nah, it gets complicated... :)
>
> But, at least I found the culprit and as soon as Linus applies the fix,
> everything will be hunky dory again, at least on this front. :P
>
I am not subscribed to LKML and linux-mm,,,
Do you have a patch with a proper subject and descriptive text? URL?
- Sedat -
> Thanks,
> --
> Zlatko
^ permalink raw reply
* Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000500
From: Sedat Dilek @ 2012-12-28 0:37 UTC (permalink / raw)
To: Zlatko Calusic; +Cc: LKML, linux-mm
In-Reply-To: <50DCE8C8.8050103@iskon.hr>
On Fri, Dec 28, 2012 at 1:33 AM, Zlatko Calusic <zlatko.calusic@iskon.hr> wrote:
> On 28.12.2012 01:24, Sedat Dilek wrote:
>>
>> On Fri, Dec 28, 2012 at 12:51 AM, Zlatko Calusic
>> <zlatko.calusic@iskon.hr> wrote:
>>>
>>> On 28.12.2012 00:42, Sedat Dilek wrote:
>>>>
>>>>
>>>> On Fri, Dec 28, 2012 at 12:39 AM, Zlatko Calusic
>>>> <zlatko.calusic@iskon.hr> wrote:
>>>>>
>>>>>
>>>>> On 28.12.2012 00:30, Sedat Dilek wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi Zlatko,
>>>>>>
>>>>>> I am not sure if I hit the same problem as described in this thread.
>>>>>>
>>>>>> Under heavy load, while building a customized toolchain for the Freetz
>>>>>> router project I got a BUG || NULL pointer derefence || kswapd ||
>>>>>> zone_balanced || pgdat_balanced() etc. (details see my screenshot).
>>>>>>
>>>>>> I will try your patch from [1] ***only*** on top of my last
>>>>>> Linux-v3.8-rc1 GIT setup (post-v3.8-rc1 mainline + some net-fixes).
>>>>>>
>>>>>
>>>>> Yes, that's the same bug. It should be fixed with my latest patch, so
>>>>> I'd
>>>>> appreciate you testing it, to be on the safe side this time. There
>>>>> should
>>>>> be
>>>>> no difference if you apply it to anything newer than 3.8-rc1, so go for
>>>>> it.
>>>>> Thanks!
>>>>>
>>>>
>>>> Not sure how I can really reproduce this bug as one build worked fine
>>>> within my last v3.8-rc1 kernel.
>>>> I increased the parallel-make-jobs-number from "4" to "8" to stress a
>>>> bit harder.
>>>> Just building right now... and will report.
>>>>
>>>> If you have any test-case (script or whatever), please let me/us know.
>>>>
>>>
>>> Unfortunately not, I haven't reproduced it yet on my machines. But it
>>> seems
>>> that bug will hit only under heavy memory pressure. When close to OOM, or
>>> possibly with lots of writing to disk. It's also possible that
>>> fragmentation
>>> of memory zones could provoke it, that means testing it for a longer
>>> time.
>>>
>>
>> I tested successfully by doing simultaneously...
>> - building Freetz with 8 parallel make-jobs
>> - building Linux GIT with 1 make-job
>> - 9 tabs open in firefox
>> - In one tab I ran YouTube music video
>> - etc.
>>
>> I am reading [1] and [2] where another user reports success by reverting
>> this...
>>
>> commit cda73a10eb3f493871ed39f468db50a65ebeddce
>> "mm: do not sleep in balance_pgdat if there's no i/o congestion"
>>
>> BTW, this machine has also 4GiB RAM (Ubuntu/precise AMD64).
>>
>> Feel free to add a "Reported-by/Tested-by" if you think this is a
>> positive report.
>>
>
> Thanks for the testing! And keep running it in case something interesting
> pops up. ;)
>
> No need to revert cda73a10eb because it fixes another bug. And the patch
> you're now running fixes the new bug I introduced with a combination of my
> latest 2 patches. Nah, it gets complicated... :)
>
> But, at least I found the culprit and as soon as Linus applies the fix,
> everything will be hunky dory again, at least on this front. :P
>
I am not subscribed to LKML and linux-mm,,,
Do you have a patch with a proper subject and descriptive text? URL?
- Sedat -
> Thanks,
> --
> Zlatko
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: Another novice question & comment
From: Russell Coker @ 2012-12-28 0:34 UTC (permalink / raw)
To: Chris Murphy; +Cc: linux-btrfs Mailing list
In-Reply-To: <5BAA2347-6336-4C05-B075-BEF93427EF07@colorremedies.com>
On Fri, 28 Dec 2012, Chris Murphy <lists@colorremedies.com> wrote:
> On Dec 27, 2012, at 12:27 PM, Gene Czarcinski <gene@czarc.net> wrote:
> > Oh thanks for that little reminder that you can put btrfs on an LV.
>
> I find it's more trouble than it's worth. It doesn't bring much to the
> table.
I've tried using LVM and BTRFS together. While they work the combination
doesn't seem to offer much benefit. LVM is good for snapshots (which BTRFS does
better) and also for dividing a device that is larger than your filesystem can
properly support (also not a problem for BTRFS).
http://etbe.coker.com.au/2012/12/17/using-btrfs/
At the above URL I've documented some of the things I'm currently doing with
BTRFS in production. I'm still considering what's the best way of managing
virtual machines. My current method is to run a server with two disks that
have separate LVM VGs and give each VM a pair of block devices to run BTRFS
RAID-1.
The other option I'm considering is a single BTRFS RAID-1 taking all disk
space and giving each VM a single block device that's a file on the BTRFS
filesystem. Presumably that will give a significant performance hit because of
double filesystem overhead but will make management a little easier and
possibly reduce seeks when multiple VMs are writing to disk.
--
My Main Blog http://etbe.coker.com.au/
My Documents Blog http://doc.coker.com.au/
^ permalink raw reply
* Re: [PATCH] x86, mm: add generic kernel/ident mapping helper
From: Yinghai Lu @ 2012-12-28 0:34 UTC (permalink / raw)
To: Borislav Petkov, Yinghai Lu, H. Peter Anvin, LKML
In-Reply-To: <20121227185126.GA10207@x1.alien8.de>
On Thu, Dec 27, 2012 at 10:51 AM, Borislav Petkov <bp@alien8.de> wrote:
>> +struct mapping_info {
>> + void *(*alloc)(void *);
>
> alloc_page
alloc_page make me feel that it will return struct page *.
>
>> + void *data;
>> + unsigned long flag;
>
> page_flags;
will change to pmd_flags
>
>> + bool kernel;
>
> kernel_space?
that is used to tell: if it is kernel mapping or ident mapping.
will change to is_kernel_mapping or kernel_mapping instead
>
> In general, all those members could use more meaningful names and some
> commenting explaining what they are, instead of people having to deduce
> what they mean from their usage in the code.
>
> Also, struct name 'mapping_info' is too generic. Maybe
> ident_mapping_info?
do you like to name it with kernel_ident_mapping_info ?
looks too long.
^ permalink raw reply
* Re: [PATCH] libceph: fix protocol feature mismatch failure path
From: Sage Weil @ 2012-12-28 0:33 UTC (permalink / raw)
To: Alex Elder; +Cc: ceph-devel
In-Reply-To: <50DCE810.2030405@inktank.com>
On Thu, 27 Dec 2012, Alex Elder wrote:
> On 12/27/2012 06:07 PM, Sage Weil wrote:
> > We should not set con->state to CLOSED here; that happens in ceph_fault()
> > in the caller, where it first asserts that the state is not yet CLOSED.
>
> I'm not seeing the code path you're talking about.
> Do you mean in the LOSSYTX case?
It's when the feature bits don't match. It calls this, sets CLOSED< and
then returns -1 and con_work() calls ceph_fault().
>
> (I don't doubt you're right, I'm just trying to follow
> along at home.)
>
> > Avoids a BUG when the features don't match.
>
> Is this related to the crashes reported here?
> http://tracker.newdream.net/issues/3657
>
> > Signed-off-by: Sage Weil <sage@inktank.com>
> > ---
> > net/ceph/messenger.c | 2 +-
> > 1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
> > index 4d111fd..24a5c86 100644
> > --- a/net/ceph/messenger.c
> > +++ b/net/ceph/messenger.c
> > @@ -1508,9 +1508,9 @@ static int process_banner(struct ceph_connection *con)
> >
> > static void fail_protocol(struct ceph_connection *con)
> > {
> > + dout("fail_protocol %p\n", con);
> > reset_connection(con);
> > BUG_ON(con->state != CON_STATE_NEGOTIATING);
>
> Since fail_protocol becomes essentially a trivial wrapper
> for reset_connection(), I think it should just go away
> and call reset_connection() directly. The assertion that
> it's in NEGOTIATING state is not very useful at the moment;
> fail_protocol() is only called from process_connect(),
> and that's only called from try_read when the state
> is NEGOTIATING.
Good point. I'll clean that up!
>
> -Alex
>
> > - con->state = CON_STATE_CLOSED;
> > }
> >
> > static int process_connect(struct ceph_connection *con)
> >
>
>
^ permalink raw reply
* Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000500
From: Zlatko Calusic @ 2012-12-28 0:33 UTC (permalink / raw)
To: sedat.dilek; +Cc: LKML, linux-mm
In-Reply-To: <CA+icZUUXbCCtPaimG6hKsSUQTmnoMAZHsd_nrn7eityepqYUkQ@mail.gmail.com>
On 28.12.2012 01:24, Sedat Dilek wrote:
> On Fri, Dec 28, 2012 at 12:51 AM, Zlatko Calusic
> <zlatko.calusic@iskon.hr> wrote:
>> On 28.12.2012 00:42, Sedat Dilek wrote:
>>>
>>> On Fri, Dec 28, 2012 at 12:39 AM, Zlatko Calusic
>>> <zlatko.calusic@iskon.hr> wrote:
>>>>
>>>> On 28.12.2012 00:30, Sedat Dilek wrote:
>>>>>
>>>>>
>>>>> Hi Zlatko,
>>>>>
>>>>> I am not sure if I hit the same problem as described in this thread.
>>>>>
>>>>> Under heavy load, while building a customized toolchain for the Freetz
>>>>> router project I got a BUG || NULL pointer derefence || kswapd ||
>>>>> zone_balanced || pgdat_balanced() etc. (details see my screenshot).
>>>>>
>>>>> I will try your patch from [1] ***only*** on top of my last
>>>>> Linux-v3.8-rc1 GIT setup (post-v3.8-rc1 mainline + some net-fixes).
>>>>>
>>>>
>>>> Yes, that's the same bug. It should be fixed with my latest patch, so I'd
>>>> appreciate you testing it, to be on the safe side this time. There should
>>>> be
>>>> no difference if you apply it to anything newer than 3.8-rc1, so go for
>>>> it.
>>>> Thanks!
>>>>
>>>
>>> Not sure how I can really reproduce this bug as one build worked fine
>>> within my last v3.8-rc1 kernel.
>>> I increased the parallel-make-jobs-number from "4" to "8" to stress a
>>> bit harder.
>>> Just building right now... and will report.
>>>
>>> If you have any test-case (script or whatever), please let me/us know.
>>>
>>
>> Unfortunately not, I haven't reproduced it yet on my machines. But it seems
>> that bug will hit only under heavy memory pressure. When close to OOM, or
>> possibly with lots of writing to disk. It's also possible that fragmentation
>> of memory zones could provoke it, that means testing it for a longer time.
>>
>
> I tested successfully by doing simultaneously...
> - building Freetz with 8 parallel make-jobs
> - building Linux GIT with 1 make-job
> - 9 tabs open in firefox
> - In one tab I ran YouTube music video
> - etc.
>
> I am reading [1] and [2] where another user reports success by reverting this...
>
> commit cda73a10eb3f493871ed39f468db50a65ebeddce
> "mm: do not sleep in balance_pgdat if there's no i/o congestion"
>
> BTW, this machine has also 4GiB RAM (Ubuntu/precise AMD64).
>
> Feel free to add a "Reported-by/Tested-by" if you think this is a
> positive report.
>
Thanks for the testing! And keep running it in case something
interesting pops up. ;)
No need to revert cda73a10eb because it fixes another bug. And the patch
you're now running fixes the new bug I introduced with a combination of
my latest 2 patches. Nah, it gets complicated... :)
But, at least I found the culprit and as soon as Linus applies the fix,
everything will be hunky dory again, at least on this front. :P
Thanks,
--
Zlatko
^ permalink raw reply
* Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000500
From: Zlatko Calusic @ 2012-12-28 0:33 UTC (permalink / raw)
To: sedat.dilek; +Cc: LKML, linux-mm
In-Reply-To: <CA+icZUUXbCCtPaimG6hKsSUQTmnoMAZHsd_nrn7eityepqYUkQ@mail.gmail.com>
On 28.12.2012 01:24, Sedat Dilek wrote:
> On Fri, Dec 28, 2012 at 12:51 AM, Zlatko Calusic
> <zlatko.calusic@iskon.hr> wrote:
>> On 28.12.2012 00:42, Sedat Dilek wrote:
>>>
>>> On Fri, Dec 28, 2012 at 12:39 AM, Zlatko Calusic
>>> <zlatko.calusic@iskon.hr> wrote:
>>>>
>>>> On 28.12.2012 00:30, Sedat Dilek wrote:
>>>>>
>>>>>
>>>>> Hi Zlatko,
>>>>>
>>>>> I am not sure if I hit the same problem as described in this thread.
>>>>>
>>>>> Under heavy load, while building a customized toolchain for the Freetz
>>>>> router project I got a BUG || NULL pointer derefence || kswapd ||
>>>>> zone_balanced || pgdat_balanced() etc. (details see my screenshot).
>>>>>
>>>>> I will try your patch from [1] ***only*** on top of my last
>>>>> Linux-v3.8-rc1 GIT setup (post-v3.8-rc1 mainline + some net-fixes).
>>>>>
>>>>
>>>> Yes, that's the same bug. It should be fixed with my latest patch, so I'd
>>>> appreciate you testing it, to be on the safe side this time. There should
>>>> be
>>>> no difference if you apply it to anything newer than 3.8-rc1, so go for
>>>> it.
>>>> Thanks!
>>>>
>>>
>>> Not sure how I can really reproduce this bug as one build worked fine
>>> within my last v3.8-rc1 kernel.
>>> I increased the parallel-make-jobs-number from "4" to "8" to stress a
>>> bit harder.
>>> Just building right now... and will report.
>>>
>>> If you have any test-case (script or whatever), please let me/us know.
>>>
>>
>> Unfortunately not, I haven't reproduced it yet on my machines. But it seems
>> that bug will hit only under heavy memory pressure. When close to OOM, or
>> possibly with lots of writing to disk. It's also possible that fragmentation
>> of memory zones could provoke it, that means testing it for a longer time.
>>
>
> I tested successfully by doing simultaneously...
> - building Freetz with 8 parallel make-jobs
> - building Linux GIT with 1 make-job
> - 9 tabs open in firefox
> - In one tab I ran YouTube music video
> - etc.
>
> I am reading [1] and [2] where another user reports success by reverting this...
>
> commit cda73a10eb3f493871ed39f468db50a65ebeddce
> "mm: do not sleep in balance_pgdat if there's no i/o congestion"
>
> BTW, this machine has also 4GiB RAM (Ubuntu/precise AMD64).
>
> Feel free to add a "Reported-by/Tested-by" if you think this is a
> positive report.
>
Thanks for the testing! And keep running it in case something
interesting pops up. ;)
No need to revert cda73a10eb because it fixes another bug. And the patch
you're now running fixes the new bug I introduced with a combination of
my latest 2 patches. Nah, it gets complicated... :)
But, at least I found the culprit and as soon as Linus applies the fix,
everything will be hunky dory again, at least on this front. :P
Thanks,
--
Zlatko
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH] libceph: fix protocol feature mismatch failure path
From: Alex Elder @ 2012-12-28 0:30 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
In-Reply-To: <1356653240-23146-1-git-send-email-sage@inktank.com>
On 12/27/2012 06:07 PM, Sage Weil wrote:
> We should not set con->state to CLOSED here; that happens in ceph_fault()
> in the caller, where it first asserts that the state is not yet CLOSED.
I'm not seeing the code path you're talking about.
Do you mean in the LOSSYTX case?
(I don't doubt you're right, I'm just trying to follow
along at home.)
> Avoids a BUG when the features don't match.
Is this related to the crashes reported here?
http://tracker.newdream.net/issues/3657
> Signed-off-by: Sage Weil <sage@inktank.com>
> ---
> net/ceph/messenger.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
> index 4d111fd..24a5c86 100644
> --- a/net/ceph/messenger.c
> +++ b/net/ceph/messenger.c
> @@ -1508,9 +1508,9 @@ static int process_banner(struct ceph_connection *con)
>
> static void fail_protocol(struct ceph_connection *con)
> {
> + dout("fail_protocol %p\n", con);
> reset_connection(con);
> BUG_ON(con->state != CON_STATE_NEGOTIATING);
Since fail_protocol becomes essentially a trivial wrapper
for reset_connection(), I think it should just go away
and call reset_connection() directly. The assertion that
it's in NEGOTIATING state is not very useful at the moment;
fail_protocol() is only called from process_connect(),
and that's only called from try_read when the state
is NEGOTIATING.
-Alex
> - con->state = CON_STATE_CLOSED;
> }
>
> static int process_connect(struct ceph_connection *con)
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.