Discussion:
[PATCH 0/4] mmu notifier debug checks v2
Daniel Vetter
2018-12-10 10:36:37 UTC
Permalink
Hi all,

Here's v2 of my mmu notifier debug checks.

I think the last two patches could probably be extended to all callbacks,
but I'm not really clear on the exact rules. But happy to extend them if
there's interest.

This stuff helps us catch issues in the i915 mmu notifier implementation.

Thanks, Daniel

Daniel Vetter (4):
mm: Check if mmu notifier callbacks are allowed to fail
kernel.h: Add non_block_start/end()
mm, notifier: Catch sleeping/blocking for !blockable
mm, notifier: Add a lockdep map for invalidate_range_start

include/linux/kernel.h | 10 +++++++++-
include/linux/mmu_notifier.h | 6 ++++++
include/linux/sched.h | 4 ++++
kernel/sched/core.c | 6 +++---
mm/mmu_notifier.c | 18 +++++++++++++++++-
5 files changed, 39 insertions(+), 5 deletions(-)
--
2.20.0.rc1
Daniel Vetter
2018-12-10 10:36:40 UTC
Permalink
We need to make sure implementations don't cheat and don't have a
possible schedule/blocking point deeply burried where review can't
catch it.

I'm not sure whether this is the best way to make sure all the
might_sleep() callsites trigger, and it's a bit ugly in the code flow.
But it gets the job done.

Inspired by an i915 patch series which did exactly that, because the
rules haven't been entirely clear to us.

v2: Use the shiny new non_block_start/end annotations instead of
abusing preempt_disable/enable.

Cc: Andrew Morton <***@linux-foundation.org>
Cc: Michal Hocko <***@suse.com>
Cc: David Rientjes <***@google.com>
Cc: "Christian König" <***@amd.com>
Cc: Daniel Vetter <***@ffwll.ch>
Cc: "Jérôme Glisse" <***@redhat.com>
Cc: linux-***@kvack.org
Signed-off-by: Daniel Vetter <***@intel.com>
---
mm/mmu_notifier.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index ccc22f21b735..a50ed7d1ecef 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
id = srcu_read_lock(&srcu);
hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) {
if (mn->ops->invalidate_range_start) {
- int _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
+ int _ret;
+
+ if (!blockable)
+ non_block_start();
+ _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
+ if (!blockable)
+ non_block_end();
if (_ret) {
pr_info("%pS callback failed with %d in %sblockable context.\n",
mn->ops->invalidate_range_start, _ret,
--
2.20.0.rc1
Daniel Vetter
2018-12-10 10:36:41 UTC
Permalink
This is a similar idea to the fs_reclaim fake lockdep lock. It's
fairly easy to provoke a specific notifier to be run on a specific
range: Just prep it, and then munmap() it.

A bit harder, but still doable, is to provoke the mmu notifiers for
all the various callchains that might lead to them. But both at the
same time is really hard to reliable hit, especially when you want to
exercise paths like direct reclaim or compaction, where it's not
easy to control what exactly will be unmapped.

By introducing a lockdep map to tie them all together we allow lockdep
to see a lot more dependencies, without having to actually hit them
in a single challchain while testing.

Aside: Since I typed this to test i915 mmu notifiers I've only rolled
this out for the invaliate_range_start callback. If there's
interest, we should probably roll this out to all of them. But my
undestanding of core mm is seriously lacking, and I'm not clear on
whether we need a lockdep map for each callback, or whether some can
be shared.

v2: Use lock_map_acquire/release() like fs_reclaim, to avoid confusion
with this being a real mutex (Chris Wilson).

Cc: Chris Wilson <***@chris-wilson.co.uk>
Cc: Andrew Morton <***@linux-foundation.org>
Cc: David Rientjes <***@google.com>
Cc: "Jérôme Glisse" <***@redhat.com>
Cc: Michal Hocko <***@suse.com>
Cc: "Christian König" <***@amd.com>
Cc: Greg Kroah-Hartman <***@linuxfoundation.org>
Cc: Daniel Vetter <***@ffwll.ch>
Cc: Mike Rapoport <***@linux.vnet.ibm.com>
Cc: linux-***@kvack.org
Signed-off-by: Daniel Vetter <***@intel.com>
---
include/linux/mmu_notifier.h | 6 ++++++
mm/mmu_notifier.c | 7 +++++++
2 files changed, 13 insertions(+)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index 9893a6432adf..19be442606c6 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -12,6 +12,10 @@ struct mmu_notifier_ops;

#ifdef CONFIG_MMU_NOTIFIER

+#ifdef CONFIG_LOCKDEP
+extern struct lockdep_map __mmu_notifier_invalidate_range_start_map;
+#endif
+
/*
* The mmu notifier_mm structure is allocated and installed in
* mm->mmu_notifier_mm inside the mm_take_all_locks() protected
@@ -267,8 +271,10 @@ static inline void mmu_notifier_change_pte(struct mm_struct *mm,
static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm,
unsigned long start, unsigned long end)
{
+ lock_map_acquire(&__mmu_notifier_invalidate_range_start_map);
if (mm_has_notifiers(mm))
__mmu_notifier_invalidate_range_start(mm, start, end, true);
+ lock_map_release(&__mmu_notifier_invalidate_range_start_map);
}

static inline int mmu_notifier_invalidate_range_start_nonblock(struct mm_struct *mm,
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index a50ed7d1ecef..c91d58fe388b 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -23,6 +23,13 @@
/* global SRCU for all MMs */
DEFINE_STATIC_SRCU(srcu);

+#ifdef CONFIG_LOCKDEP
+struct lockdep_map __mmu_notifier_invalidate_range_start_map = {
+ .name = "mmu_notifier_invalidate_range_start"
+};
+EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_start_map);
+#endif
+
/*
* This function allows mmu_notifier::release callback to delay a call to
* a function that will free appropriate resources. The function must be
--
2.20.0.rc1
Michal Hocko
2018-12-10 13:27:59 UTC
Permalink
Just a bit of paranoia, since if we start pushing this deep into
callchains it's hard to spot all places where an mmu notifier
implementation might fail when it's not allowed to.
Inspired by some confusion we had discussing i915 mmu notifiers and
whether we could use the newly-introduced return value to handle some
corner cases. Until we realized that these are only for when a task
has been killed by the oom reaper.
An alternative approach would be to split the callback into two
versions, one with the int return value, and the other with void
return value like in older kernels. But that's a lot more churn for
fairly little gain I think.
Summary from the m-l discussion on why we want something at warning
level: This allows automated tooling in CI to catch bugs without
humans having to look at everything. If we just upgrade the existing
pr_info to a pr_warn, then we'll have false positives. And as-is, no
one will ever spot the problem since it's lost in the massive amounts
of overall dmesg noise.
OK, fair enough. If this is going to help with testing then I do not
have any objections of course.
v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
the problematic case (Michal Hocko).
Thanks!
---
mm/mmu_notifier.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 5119ff846769..ccc22f21b735 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -190,6 +190,9 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
pr_info("%pS callback failed with %d in %sblockable context.\n",
mn->ops->invalidate_range_start, _ret,
!blockable ? "non-" : "");
+ if (blockable)
+ pr_warn("%pS callback failure not allowed\n",
+ mn->ops->invalidate_range_start);
ret = _ret;
}
}
--
2.20.0.rc1
--
Michal Hocko
SUSE Labs
Peter Zijlstra
2018-12-10 16:30:09 UTC
Permalink
OK, no real objections to the thing. Just so long we're all on the same
page as to what it does and doesn't do ;-)
I am not really sure whether there are other potential users besides
this one and whether the check as such is justified.
It's a debug option...
I suppose you could extend the check to include schedule_debug() as
Do you mean to make the check cheaper?
Nah, so the patch only touched might_sleep(), the below touches
schedule().

If there were a patch that hits schedule() without going through a
might_sleep() (rare in practise I think, but entirely possible) then you
won't get a splat without something like the below on top.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f66920173370..b1aaa278f1af 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3278,13 +3278,18 @@ static noinline void __schedule_bug(struct task_struct *prev)
/*
*/
-static inline void schedule_debug(struct task_struct *prev)
+static inline void schedule_debug(struct task_struct *prev, bool preempt)
{
#ifdef CONFIG_SCHED_STACK_END_CHECK
if (task_stack_end_corrupted(prev))
panic("corrupted stack end detected inside scheduler\n");
#endif
+#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
+ if (!preempt && prev->state && prev->non_block_count)
+ // splat
+#endif
+
if (unlikely(in_atomic_preempt_off())) {
__schedule_bug(prev);
preempt_count_set(PREEMPT_DISABLED);
@@ -3391,7 +3396,7 @@ static void __sched notrace __schedule(bool preempt)
rq = cpu_rq(cpu);
prev = rq->curr;
- schedule_debug(prev);
+ schedule_debug(prev, preempt);
if (sched_feat(HRTICK))
hrtick_clear(rq);
--
Michal Hocko
SUSE Labs
Koenig, Christian
2018-12-10 10:44:38 UTC
Permalink
Patches #1 and #3 are Reviewed-by: Christian König
<***@amd.com>

Patch #2 is Acked-by: Christian König <***@amd.com> because
I can't judge if adding the counter in the thread structure is actually
a good idea.

In patch #4 I honestly don't understand at all how this stuff works, so
no-comment from my side on this.

Christian.
Just a bit of paranoia, since if we start pushing this deep into
callchains it's hard to spot all places where an mmu notifier
implementation might fail when it's not allowed to.
Inspired by some confusion we had discussing i915 mmu notifiers and
whether we could use the newly-introduced return value to handle some
corner cases. Until we realized that these are only for when a task
has been killed by the oom reaper.
An alternative approach would be to split the callback into two
versions, one with the int return value, and the other with void
return value like in older kernels. But that's a lot more churn for
fairly little gain I think.
Summary from the m-l discussion on why we want something at warning
level: This allows automated tooling in CI to catch bugs without
humans having to look at everything. If we just upgrade the existing
pr_info to a pr_warn, then we'll have false positives. And as-is, no
one will ever spot the problem since it's lost in the massive amounts
of overall dmesg noise.
v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
the problematic case (Michal Hocko).
---
mm/mmu_notifier.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 5119ff846769..ccc22f21b735 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -190,6 +190,9 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
pr_info("%pS callback failed with %d in %sblockable context.\n",
mn->ops->invalidate_range_start, _ret,
!blockable ? "non-" : "");
+ if (blockable)
+ pr_warn("%pS callback failure not allowed\n",
+ mn->ops->invalidate_range_start);
ret = _ret;
}
}
Loading...