Discussion:
[Bug 201273] New: Fatal error during GPU init amdgpu
b***@bugzilla.kernel.org
2018-09-28 16:37:07 UTC
Permalink
https://bugzilla.kernel.org/show_bug.cgi?id=201273

Bug ID: 201273
Summary: Fatal error during GPU init amdgpu
Product: Drivers
Version: 2.5
Kernel Version: 4.18.9 4.18.10 and possibly earlier
Hardware: x86-64
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: Video(DRI - non Intel)
Assignee: drivers_video-***@kernel-bugs.osdl.org
Reporter: ***@freenet.de
Regression: No

Created attachment 278827
--> https://bugzilla.kernel.org/attachment.cgi?id=278827&action=edit
dmesg lsmod lspci lsusb cpuinfo url

Since an installation of an AMD-Radeon RX 560 to an APU-based system it
sometimes shows a black screen at bootup ( USB-Keyboard hangs too, no sysreq ->
reset)
Sometimes system boots to GUI, console garbled, dmesg shows fatal error during
GPU init, reboot/shutdown hangs.
BIOS: APU auto or disabled (on not tested)
Attached:
dmesg with/without Error
lsmod with/without Error
lspci / lsusb
/proc/cpuinfo
url to MB and graphics card
--
You are receiving this mail because:
You are watching the assignee of the bug.
b***@bugzilla.kernel.org
2018-10-09 13:00:50 UTC
Permalink
https://bugzilla.kernel.org/show_bug.cgi?id=201273

***@freenet.de changed:

What |Removed |Added
----------------------------------------------------------------------------
Hardware|x86-64 |S390-31
Summary|Fatal error during GPU init |Fatal error during GPU init
|amdgpu |amdgpu RX560
--
You are receiving this mail because:
You are watching the assignee of the bug.
b***@bugzilla.kernel.org
2018-10-09 17:36:03 UTC
Permalink
https://bugzilla.kernel.org/show_bug.cgi?id=201273

Alex Deucher (***@gmail.com) changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@gmail.com

--- Comment #1 from Alex Deucher (***@gmail.com) ---
Is this a regression? If so, can you bisect?
--
You are receiving this mail because:
You are watching the assignee of the bug.
b***@bugzilla.kernel.org
2018-10-09 21:44:12 UTC
Permalink
https://bugzilla.kernel.org/show_bug.cgi?id=201273

--- Comment #2 from ***@freenet.de ---
Created attachment 278973
--> https://bugzilla.kernel.org/attachment.cgi?id=278973&action=edit
dmesg old + new

The RX560 is new. Error happens only from time to time and only at bootup and i
have to press reset mostly. While bisecting
https://bugzilla.kernel.org/show_bug.cgi?id=201275
i had created two more files with output of dmesg. I have included all of them
in archive. Maybe v4.18.0 is affected too. I could use an older Kernel to
verify, but since this is not an predictible case, i would have to use it for
several days to check this. Rarely it happens that Monitor blacks out ( really
black, back lights and sound off too), but this may be an hardware issue.
Fiddling at the hdmi plug on the monitors side resolves that. This may be an
issue, if code is affected at init, but i don´t think it is.
I could disable CONFIG_DRM_AMDGPU_CIK to test, if this is related too.
--
You are receiving this mail because:
You are watching the assignee of the bug.
b***@bugzilla.kernel.org
2018-10-10 15:47:07 UTC
Permalink
https://bugzilla.kernel.org/show_bug.cgi?id=201273

--- Comment #3 from ***@freenet.de ---
(In reply to Alex Deucher from comment #1)
Post by b***@bugzilla.kernel.org
Is this a regression? If so, can you bisect?
which is the first release supporting RX560?
--
You are receiving this mail because:
You are watching the assignee of the bug.
b***@bugzilla.kernel.org
2018-10-17 05:56:54 UTC
Permalink
https://bugzilla.kernel.org/show_bug.cgi?id=201273

--- Comment #4 from ***@freenet.de ---
I have mounted my monitor on the wall about a week ago. Hence the torsional
moment at the HDMI-plug has changed. None of the described errors did occur
since. So the crash may be triggered by a bad HDMI signal caused by a bad plug
or even a hair crack at the monitors board. I have used v4.18.12 and earlier.
--
You are receiving this mail because:
You are watching the assignee of the bug.
b***@bugzilla.kernel.org
2018-10-17 09:27:06 UTC
Permalink
https://bugzilla.kernel.org/show_bug.cgi?id=201273

--- Comment #5 from ***@freenet.de ---
It just happened again. (v4.18.14)
For now I will attach another monitor with another cable. To check that may
take more than a week.
--
You are receiving this mail because:
You are watching the assignee of the bug.
b***@bugzilla.kernel.org
2018-10-18 13:21:23 UTC
Permalink
https://bugzilla.kernel.org/show_bug.cgi?id=201273

--- Comment #6 from ***@freenet.de ---
Created attachment 279089
--> https://bugzilla.kernel.org/attachment.cgi?id=279089&action=edit
dmesg + amdgpu_pm_info

New Monitor and HDMI-cable. Bug is not impressed - i.e. System hangs at bootup
sometimes (v4.18.14).

Maybe another Bug:
System booted normally, but graphics stucks later.
dmesg shows errormessages at bootup:
[ 9.941637] amdgpu: [powerplay]
failed to send message 148 ret is 0
...

This kind of messages got triggered by use of sensors later again. Graphics
stucked and cat /sys/.../amdgpu_pm_info took some seconds.
--
You are receiving this mail because:
You are watching the assignee of the bug.
b***@bugzilla.kernel.org
2018-10-18 13:22:32 UTC
Permalink
https://bugzilla.kernel.org/show_bug.cgi?id=201273

***@freenet.de changed:

What |Removed |Added
----------------------------------------------------------------------------
Hardware|S390-31 |x86-64
--
You are receiving this mail because:
You are watching the assignee of the bug.
b***@bugzilla.kernel.org
2018-10-23 14:00:46 UTC
Permalink
https://bugzilla.kernel.org/show_bug.cgi?id=201273

--- Comment #7 from ***@freenet.de ---
Bug is still alive. v4.19
--
You are receiving this mail because:
You are watching the assignee of the bug.
b***@bugzilla.kernel.org
2018-11-05 06:22:09 UTC
Permalink
https://bugzilla.kernel.org/show_bug.cgi?id=201273

--- Comment #8 from ***@freenet.de ---
I have replaced HDMI-cable by displayport about 2 weeks ago. No bug visible.
(Firmware update about 1 week ago).
Maybe HDMI is broken or implementation in monitors/graphics board is bad or
cables are bad or implementation in amdgpu is bad.
HDMI with old monitor and old cable worked well for about 2 years with radeon
and APU.
--
You are receiving this mail because:
You are watching the assignee of the bug.
b***@bugzilla.kernel.org
2018-11-05 15:39:54 UTC
Permalink
https://bugzilla.kernel.org/show_bug.cgi?id=201273

--- Comment #9 from Alex Deucher (***@gmail.com) ---
Does this patch help?
https://patchwork.freedesktop.org/patch/259364/
--
You are receiving this mail because:
You are watching the assignee of the bug.
b***@bugzilla.kernel.org
2018-11-05 19:52:50 UTC
Permalink
https://bugzilla.kernel.org/show_bug.cgi?id=201273

--- Comment #10 from ***@freenet.de ---
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Post by b***@bugzilla.kernel.org
https://bugzilla.kernel.org/show_bug.cgi?id=201273
this patch help? https://patchwork.freedesktop.org/patch/259364/
Bug just had a short leave. It is still alive with v4.19, even with
displayport. I'll check mentioned patch, but it may not work as
mainboard supports pci-e 3.0 (with limitations). However it may take
several weeks to verify.

https://asrock.com/MB/AMD/FM2A78M%20Pro3+/index.de.asp?cat=Download
- - 1 x PCI Express 3.0 x16 Slot (PCIE1 @ x16 mode)
...
*PCIE 3.0 is only supported with FM2+ CPU. With FM2 CPU, it only
supports PCIE 2.0.
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCAAdFiEER2Zow4uiUfO8Mj4TrEnn2SiQ7YcFAlvgkH0ACgkQrEnn2SiQ
7YcX3Qf/aL+2nIYDculQZdJzBVxByBLscwUgOLqrKUWh16JAwG3DVcffA4vwEKP2
eQVfjO8hyAeWnAQAzmPLEqvs5DCRP8fGqK63/JQ3hPNOia1ljY3djxJj+mLaFKA4
6tSyV+A/2ulnixQO/1I8SWNMAG4c9H7L7TY/RtGRYyLuakjut5OIPmCBJ1eQ77ZD
G8nH3hL2bW7e1/dH7pIkihvX1j7H+cDPDYTxSMe9sPyfcBeurwqwSvRz+bGHjI10
FCiJfu69u7z3i3W8uFPBUi0XNEXXI0jbJtqKR1NoFm/Pa8nVhGt1k1LnWKrPL7b1
oCKlYiH67c1qbpr2UYXhcVKhYhROgQ==
=tRbn
-----END PGP SIGNATURE-----
--
You are receiving this mail because:
You are watching the assignee of the bug.
b***@bugzilla.kernel.org
2018-11-09 17:24:10 UTC
Permalink
https://bugzilla.kernel.org/show_bug.cgi?id=201273

--- Comment #11 from ***@freenet.de ---
Created attachment 279393
--> https://bugzilla.kernel.org/attachment.cgi?id=279393&action=edit
config+dmesg for patched 4.19.1

Bug is still alive.
A more precise description:
System boots and shows some plymouth boot messages at low res. At normal boot
console switches to high res and shows more messages:
Console: switching to colour frame buffer device 240x67
amdgpu 0000:01:00.0: fb0: amdgpudrmfb frame buffer device
This may be the point where system hangs, because high res messages never have
been seen, when system crashed.
--
You are receiving this mail because:
You are watching the assignee of the bug.
b***@bugzilla.kernel.org
2018-11-21 18:10:00 UTC
Permalink
https://bugzilla.kernel.org/show_bug.cgi?id=201273

--- Comment #12 from ***@freenet.de ---
Bug is still alive. v4.19.3 + patch
--
You are receiving this mail because:
You are watching the assignee of the bug.
b***@bugzilla.kernel.org
2018-11-23 16:00:11 UTC
Permalink
https://bugzilla.kernel.org/show_bug.cgi?id=201273

--- Comment #13 from ***@freenet.de ---
(In reply to quirin.blaeser from comment #11)
Created attachment 279393 [details]
config+dmesg for patched 4.19.1
Bug is still alive.
System boots and shows some plymouth boot messages at low res. At normal
Console: switching to colour frame buffer device 240x67
amdgpu 0000:01:00.0: fb0: amdgpudrmfb frame buffer device
This may be the point where system hangs, because high res messages never
have been seen, when system crashed.
Addendum:
- Count of plymouth boot messages in low res is not constant
- Error message has not been seen for at least 4 weeks.
- Behaviuor has changed somewhat:
· old: plymouth low res - clear screen - switch off backlight - system hangs
· new: plymouth low res - clear screen - system hangs
--
You are receiving this mail because:
You are watching the assignee of the bug.
b***@bugzilla.kernel.org
2018-11-23 21:34:17 UTC
Permalink
https://bugzilla.kernel.org/show_bug.cgi?id=201273

--- Comment #14 from Alex Deucher (***@gmail.com) ---
Created attachment 279635
--> https://bugzilla.kernel.org/attachment.cgi?id=279635&action=edit
test fix

Does this patch help?
--
You are receiving this mail because:
You are watching the assignee of the bug.
b***@bugzilla.kernel.org
2018-11-24 18:03:45 UTC
Permalink
https://bugzilla.kernel.org/show_bug.cgi?id=201273

--- Comment #15 from ***@freenet.de ---
(In reply to Alex Deucher from comment #14)
Created attachment 279635 [details]
test fix
Does this patch help?
I'll check that, but it may require at least a week to verify.
--
You are receiving this mail because:
You are watching the assignee of the bug.
b***@bugzilla.kernel.org
2018-11-25 20:41:20 UTC
Permalink
https://bugzilla.kernel.org/show_bug.cgi?id=201273

--- Comment #16 from ***@freenet.de ---
(In reply to quirin.blaeser from comment #15)
Post by b***@bugzilla.kernel.org
(In reply to Alex Deucher from comment #14)
Created attachment 279635 [details]
test fix
Does this patch help?
I'll check that, but it may require at least a week to verify.
Bug is still alive: V4.19.3 + patches
--
You are receiving this mail because:
You are watching the assignee of the bug.
b***@bugzilla.kernel.org
2018-12-08 07:12:26 UTC
Permalink
https://bugzilla.kernel.org/show_bug.cgi?id=201273

--- Comment #17 from ***@freenet.de ---
Bug is still alive: v4.19.5, no patches
--
You are receiving this mail because:
You are watching the assignee of the bug.
b***@bugzilla.kernel.org
2018-12-10 19:51:31 UTC
Permalink
https://bugzilla.kernel.org/show_bug.cgi?id=201273

--- Comment #18 from ***@freenet.de ---
Bug is still alive: v4.19.7
--
You are receiving this mail because:
You are watching the assignee of the bug.
Loading...