Bug 101573

Summary: GP107 crash with no HDMI connected on 4.12.rc6
Product: xorg Reporter: 13t8Pm490DD44eZ <harry_x>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED DUPLICATE QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg output none

Description 13t8Pm490DD44eZ 2017-06-24 05:14:08 UTC
When starting the machine on Dell Inspiron 7000 (Kabylake, GTX 1050 Ti) with HDMI monitor connected (HDMI output is provided by NVIDIA card, eDP is connected to internal), everything seems to work (using reverse prime that is automatically setup). There is lot of tearing, but it works.

But when starting the machine without HDMI output connected (so the NVIDIA card has no connected output), it fails with timeouts:

čen 23 22:11:03 blackgate kernel: nouveau 0000:01:00.0: DRM: resuming object tree...
čen 23 22:11:03 blackgate kernel: nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 409800 [ TIMEOUT ]
čen 23 22:11:05 blackgate kernel: nouveau 0000:01:00.0: timeout
čen 23 22:11:05 blackgate kernel: ------------[ cut here ]------------
čen 23 22:11:05 blackgate kernel: WARNING: CPU: 6 PID: 138 at drivers/gpu/drm/nouveau/nvkm/engine/gr/gf100.c:1501 gf100_gr_init_ctxctl+0x81f/0x9a0 [nouveau]
čen 23 22:11:05 blackgate kernel: Modules linked in: xt_nat veth xfs ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter bridge stp llc tun dm_thin_pool dm_persistent_data dm_bi
o_prison dm_bufio loop ecryptfs cbc encrypted_keys trusted ctr ccm cmac rfcomm pci_stub vboxpci(O) xt_tcpudp iptable_filter iptable_nat nf_conntrack_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_defrag_ipv6 nf_nat_ipv4 xt_conntrack nf_nat nf_co
nntrack ip6table_filter ip6_tables libcrc32c crc32c_generic bnep joydev mousedev uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 hid_multitouch videobuf2_core videodev btusb btrtl media fuse snd_hda_codec_realtek snd_hda_codec_g
eneric arc4 iTCO_wdt iTCO_vendor_support nls_iso8859_1 nls_cp437 vfat fat i2c_designware_platform iwlmvm i2c_designware_core
čen 23 22:11:05 blackgate kernel:  dell_wmi mac80211 dell_laptop dell_smbios dcdbas intel_rapl dell_smm_hwmon x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iwlwifi irqbypass crct10dif_pclmul snd_hda_intel cfg80211 r8168(O) g
hash_clmulni_intel snd_hda_codec intel_cstate intel_rapl_perf snd_hda_core psmouse snd_hwdep hci_uart input_leds snd_pcm btbcm idma64 btqca mei_me snd_timer btintel i2c_hid processor_thermal_device snd i2c_i801 mei soundcore intel_lpss_pci
 intel_pch_thermal bluetooth shpchp intel_soc_dts_iosf thermal hid battery int3403_thermal ecdh_generic tpm_crb rfkill intel_lpss_acpi intel_lpss int3402_thermal int340x_thermal_zone evdev intel_hid mac_hid int3400_thermal acpi_als sparse_
keymap tpm_tis acpi_thermal_rel kfifo_buf ac tpm_tis_core industrialio acpi_pad tpm sch_fq_codel msr vboxnetadp(O)
čen 23 22:11:05 blackgate kernel:  vboxnetflt(O) vboxdrv(O) i2c_dev sg ip_tables x_tables sd_mod ext4 crc16 jbd2 fscrypto mbcache dm_mod dax serio_raw atkbd libps2 crc32_pclmul crc32c_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_he
lper ahci libahci nouveau(-) libata led_class mxm_wmi scsi_mod xhci_pci ttm i8042 serio wmi i915 video button intel_gtt i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm xhci_hcd usbcore usb_common nvme nvme_cor
e
čen 23 22:11:05 blackgate kernel: CPU: 6 PID: 138 Comm: kworker/6:2 Tainted: G           O    4.12.0-rc6-mainline #1
čen 23 22:11:05 blackgate kernel: Hardware name: Dell Inc. Inspiron 15 7000 Gaming/065C71, BIOS 01.00.05 03/01/2017
čen 23 22:11:05 blackgate kernel: Workqueue: pm pm_runtime_work
čen 23 22:11:05 blackgate kernel: task: ffff88085a909d80 task.stack: ffffc90003abc000
čen 23 22:11:05 blackgate kernel: RIP: 0010:gf100_gr_init_ctxctl+0x81f/0x9a0 [nouveau]
čen 23 22:11:05 blackgate kernel: RSP: 0018:ffffc90003abfa08 EFLAGS: 00010286
čen 23 22:11:05 blackgate kernel: RAX: 000000000000001d RBX: ffff88085aa2f830 RCX: 0000000000000000
čen 23 22:11:05 blackgate kernel: RDX: 0000000000000000 RSI: ffff88087f58dcc8 RDI: ffff88087f58dcc8
čen 23 22:11:05 blackgate kernel: RBP: ffffc90003abfa38 R08: 00000000000003e2 R09: 0000000000000004
čen 23 22:11:05 blackgate kernel: R10: ffffc90003abf8a8 R11: 0000000000000001 R12: ffff880851cd0000
čen 23 22:11:05 blackgate kernel: R13: 0000000077363ba0 R14: ffff8808526ddba0 R15: 0000001cebd02520
čen 23 22:11:05 blackgate kernel: FS:  0000000000000000(0000) GS:ffff88087f580000(0000) knlGS:0000000000000000
čen 23 22:11:05 blackgate kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
čen 23 22:11:05 blackgate kernel: CR2: 000000000232146c CR3: 0000000001a09000 CR4: 00000000003406e0
čen 23 22:11:05 blackgate kernel: Call Trace:
čen 23 22:11:05 blackgate kernel:  gp100_gr_init+0x6f0/0x720 [nouveau]
čen 23 22:11:05 blackgate kernel:  gf100_gr_init_+0x55/0x60 [nouveau]
čen 23 22:11:05 blackgate kernel:  nvkm_gr_init+0x17/0x20 [nouveau]
čen 23 22:11:05 blackgate kernel:  nvkm_engine_init+0x68/0x1f0 [nouveau]
čen 23 22:11:05 blackgate kernel:  nvkm_subdev_init+0xb0/0x200 [nouveau]
čen 23 22:11:05 blackgate kernel:  nvkm_device_init+0x13c/0x270 [nouveau]
čen 23 22:11:05 blackgate kernel:  nvkm_udevice_init+0x48/0x60 [nouveau]
čen 23 22:11:05 blackgate kernel:  nvkm_object_init+0x3f/0x190 [nouveau]
čen 23 22:11:05 blackgate kernel:  nvkm_object_init+0xa3/0x190 [nouveau]


The stack trace is different every time, for example:
čen 23 22:11:07 blackgate kernel: nouveau 0000:01:00.0: timeout
čen 23 22:11:07 blackgate kernel: ------------[ cut here ]------------
čen 23 22:11:07 blackgate kernel: WARNING: CPU: 6 PID: 138 at drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gf100.c:190 gf100_vm_flush+0x1ab/0x1c0 [nouveau]
čen 23 22:11:07 blackgate kernel: Modules linked in: xt_nat veth xfs ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter bridge stp llc tun dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio loop ecryptfs cbc encrypted_keys trusted ctr ccm cmac rfcomm pci_stub vboxpci(O) xt_tcpudp iptable_filter iptable_nat nf_conntrack_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_defrag_ipv6 nf_nat_ipv4 xt_conntrack nf_nat nf_conntrack ip6table_filter ip6_tables libcrc32c crc32c_generic bnep joydev mousedev uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 hid_multitouch videobuf2_core videodev btusb btrtl media fuse snd_hda_codec_realtek snd_hda_codec_generic arc4 iTCO_wdt iTCO_vendor_support nls_iso8859_1 nls_cp437 vfat fat i2c_designware_platform iwlmvm i2c_designware_core
čen 23 22:11:07 blackgate kernel:  dell_wmi mac80211 dell_laptop dell_smbios dcdbas intel_rapl dell_smm_hwmon x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iwlwifi irqbypass crct10dif_pclmul snd_hda_intel cfg80211 r8168(O) ghash_clmulni_intel snd_hda_codec intel_cstate intel_rapl_perf snd_hda_core psmouse snd_hwdep hci_uart input_leds snd_pcm btbcm idma64 btqca mei_me snd_timer btintel i2c_hid processor_thermal_device snd i2c_i801 mei soundcore intel_lpss_pci intel_pch_thermal bluetooth shpchp intel_soc_dts_iosf thermal hid battery int3403_thermal ecdh_generic tpm_crb rfkill intel_lpss_acpi intel_lpss int3402_thermal int340x_thermal_zone evdev intel_hid mac_hid int3400_thermal acpi_als sparse_keymap tpm_tis acpi_thermal_rel kfifo_buf ac tpm_tis_core industrialio acpi_pad tpm sch_fq_codel msr vboxnetadp(O)
čen 23 22:11:07 blackgate kernel:  vboxnetflt(O) vboxdrv(O) i2c_dev sg ip_tables x_tables sd_mod ext4 crc16 jbd2 fscrypto mbcache dm_mod dax serio_raw atkbd libps2 crc32_pclmul crc32c_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper ahci libahci nouveau(-) libata led_class mxm_wmi scsi_mod xhci_pci ttm i8042 serio wmi i915 video button intel_gtt i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm xhci_hcd usbcore usb_common nvme nvme_core
čen 23 22:11:07 blackgate kernel: CPU: 6 PID: 138 Comm: kworker/6:2 Tainted: G        W  O    4.12.0-rc6-mainline #1
čen 23 22:11:07 blackgate kernel: Hardware name: Dell Inc. Inspiron 15 7000 Gaming/065C71, BIOS 01.00.05 03/01/2017
čen 23 22:11:07 blackgate kernel: Workqueue: pm pm_runtime_work
čen 23 22:11:07 blackgate kernel: task: ffff88085a909d80 task.stack: ffffc90003abc000
čen 23 22:11:07 blackgate kernel: RIP: 0010:gf100_vm_flush+0x1ab/0x1c0 [nouveau]
čen 23 22:11:07 blackgate kernel: RSP: 0018:ffffc90003abf7e8 EFLAGS: 00010282
čen 23 22:11:07 blackgate kernel: RAX: 000000000000001d RBX: ffff88085aa2f830 RCX: 0000000000000000
čen 23 22:11:07 blackgate kernel: RDX: 0000000000000000 RSI: ffff88087f58dcc8 RDI: ffff88087f58dcc8
čen 23 22:11:07 blackgate kernel: RBP: ffffc90003abf828 R08: 0000000000000415 R09: 0000000000000004
čen 23 22:11:07 blackgate kernel: R10: ffffffffa05be080 R11: 0000000000000001 R12: ffff880853d72c00
čen 23 22:11:07 blackgate kernel: R13: ffff8808526ddba0 R14: ffff88085ae2f420 R15: 0000001d6343fb00
čen 23 22:11:07 blackgate kernel: FS:  0000000000000000(0000) GS:ffff88087f580000(0000) knlGS:0000000000000000
čen 23 22:11:07 blackgate kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
čen 23 22:11:07 blackgate kernel: CR2: 000000000232146c CR3: 0000000001a09000 CR4: 00000000003406e0
čen 23 22:11:07 blackgate kernel: Call Trace:
čen 23 22:11:07 blackgate kernel:  nvkm_vm_map_at+0x189/0x1a0 [nouveau]
čen 23 22:11:07 blackgate kernel:  nv50_instobj_map+0x1b/0x20 [nouveau]
čen 23 22:11:07 blackgate kernel:  nv50_instobj_boot+0x89/0x100 [nouveau]
čen 23 22:11:07 blackgate kernel:  nv50_instobj_acquire+0x4b/0x70 [nouveau]
čen 23 22:11:07 blackgate kernel:  nvkm_instobj_acquire_slow+0x17/0x30 [nouveau]
čen 23 22:11:07 blackgate kernel:  nvkm_instobj_new+0x6e/0x180 [nouveau]
čen 23 22:11:07 blackgate kernel:  nvkm_memory_new+0x44/0x80 [nouveau]
čen 23 22:11:07 blackgate kernel:  nvkm_vm_get+0x14a/0x240 [nouveau]
čen 23 22:11:07 blackgate kernel:  nvkm_gpuobj_map+0x33/0x60 [nouveau]
čen 23 22:11:07 blackgate kernel:  gm200_secboot_run_blob+0x8d/0x180 [nouveau]
čen 23 22:11:07 blackgate kernel:  ? flush_work+0x3f/0x1b0
čen 23 22:11:07 blackgate kernel:  gp102_secboot_run_blob+0x1d9/0x2e0 [nouveau]
Comment 1 13t8Pm490DD44eZ 2017-06-24 05:41:26 UTC
Created attachment 132209 [details]
dmesg output
Comment 2 13t8Pm490DD44eZ 2017-06-24 05:43:08 UTC
Versions:

ArchLinux
Kernel 4.12.rc6
xf86-video-nouveau 1.0.15-1 
xf86-video-intel 1:2.99.917+777+g6babcf15-1
mesa 17.1.3-1 

All are latest archlinux packages.
Comment 3 Rhys Kidd 2017-06-24 14:36:30 UTC
Hi harry_x,

This timeout fault with the GP107 (nv137) is already being tracked in bz#100228.

It's helpful that you've seen in testing that the NVIDIA card is brought up when a HDMI output is connected to it, but not when the NVIDIA card has no connected output.

*** This bug has been marked as a duplicate of bug 100228 ***
Comment 4 Ilia Mirkin 2017-06-24 14:54:57 UTC
This is actually reminiscent of an issue some laptops had a while ago where without HDMI plugged in on boot, the board would be *totally off*, i.e. not even on the PCI bus. And no real way that we could figure out to get it back... after plugging in HDMI one had to do a pci bus rescan, and it would appear.

Probably worth checking if those other people don't have a similar situation.
Comment 5 13t8Pm490DD44eZ 2017-06-24 15:25:31 UTC
Hello Rhys and Ilia, 

thank you for your feedback. Is there anything I can do to help you track down this issue ? I am senior C developer with kernel driver development experience, but I know nothing of the NVIDIA card architecture, so I can't probably fix it myself...  But if there is anything how can I help with debugging of this problem, I would be happy to help. Right now it works with NVIDIA BLOB, but I am quite unhappy about the setup, I would much prefer to use open source driver for many reasons...

I am not sure it is the same issue regarding PCI bus, I can see the device normally in lspci -vv, but I don't know if that means anything...
Comment 6 Ilia Mirkin 2017-06-24 15:55:01 UTC
(In reply to harry_x from comment #5)
> But if there is anything how can I help with
> debugging of this problem, I would be happy to help.

Join us in #nouveau on irc.freenode.net. Unfortunately with these newer boards, we understand very little about their initialization, esp wrt secboot.

> I am not sure it is the same issue regarding PCI bus, I can see the device
> normally in lspci -vv, but I don't know if that means anything...

Yeah, it's clearly a different issue. Just seemed a little reminiscent -- without HDMI plugged in, gr init times out because it feels like the GPU is semi-off.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.