It seems that there is a regression present in 3.5+ kernels with nouveau on GTX 560 card, X does not start and artifact present during boot (rectange on top right). Reverting commit below resolves issue, X can be started and artifact is gone. Attached is the kern.log from machine booted with drm.debug=0x04 1a46098e910b96337f0fe3838223db43b923bad4 is the first bad commit commit 1a46098e910b96337f0fe3838223db43b923bad4 Author: Ben Skeggs <bskeggs@redhat.com> Date: Fri May 4 15:17:28 2012 +1000 drm/nvc0/ttm: use copy engines for async buffer moves Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Created attachment 65623 [details] kern.log
Based on your description of the problem and the attached kern.log, this seems to be a duplicate of my bug, number 53101. Thanks for fixing the regression. Hopefully your patch gets accepted upstream and gets integrated into distro kernels sooner rather than later. https://bugs.freedesktop.org/show_bug.cgi?id=53101https://bugs.freedesktop.org/show_bug.cgi?id=53101 *** This bug has been marked as a duplicate of bug 53101 ***
Created attachment 66162 [details] [review] restrict pce usage based on punits values Re-opening as the bug this was marked a duplicate of is a mess and could possibly be multiple issues. I've attached a patch which should help at least users with NVCE (GF114) chipsets.
Ben, I am able to boot with 3.5.3 + this patch and so far see no other issues. If I can be of any other help, please let me know.
I am using 3.6.0-rc3 and the related patch that showed up on git recently and everything is working great on my card now.
Unfortunately this is not completely fixed. I am still getting same problem after 1-2 days of uptime, X crashes and stuck in restart loop - forced to reboot. dmesg: [225619.491763] [drm] nouveau 0000:01:00.0: PFIFO: read fault at 0x0008028000 [PAGE_NOT_PRESENT] from PFIFO/PFIFO on channel 0x000013a000 [225623.000984] [drm] nouveau 0000:01:00.0: GPU lockup - switching to software fbcon [225626.049326] [drm] nouveau 0000:01:00.0: Failed to idle channel 1. [225629.047337] [drm] nouveau 0000:01:00.0: Failed to idle channel 2. [225634.044033] [drm] nouveau 0000:01:00.0: Failed to idle channel 4. [225637.042033] [drm] nouveau 0000:01:00.0: Failed to idle channel 3. [225646.703625] [drm] nouveau 0000:01:00.0: Failed to idle channel 1. [225649.701636] [drm] nouveau 0000:01:00.0: Failed to idle channel 2. [225659.263372] [drm] nouveau 0000:01:00.0: Failed to idle channel 1. [225662.261307] [drm] nouveau 0000:01:00.0: Failed to idle channel 2. [225671.806976] [drm] nouveau 0000:01:00.0: Failed to idle channel 1. [225674.804998] [drm] nouveau 0000:01:00.0: Failed to idle channel 2.
I tried the latest git and hit a bug in under 24 hours :( Sep 16 11:19:32 desktop kernel: [33219.647922] nouveau W[ PFIFO][0000:01:00.0] unknown status 0x40000000 Sep 16 12:04:47 desktop kernel: [35932.327284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000012 Sep 16 12:04:47 desktop kernel: [35932.327317] IP: [<ffffffffa0498cc5>] nouveau_mm_free+0x85/0x180 [nouveau] Sep 16 12:04:47 desktop kernel: [35932.327357] PGD 222c3d067 PUD 2220d6067 PMD 0 Sep 16 12:04:47 desktop kernel: [35932.327375] Oops: 0002 [#1] PREEMPT SMP Sep 16 12:04:47 desktop kernel: [35932.327392] Modules linked in: tun bnep rfcomm bluetooth rfkill pci_stub vboxpci(O) vboxnetadp(O) cpufreq_stats parport_pc vboxnetflt(O) ppdev lp parport vboxdrv(O) binfmt_misc zram(C) zsmalloc(C) nfsd exportfs auth_rpcgss nfs_acl nfs lockd fscache sunrpc fuse ext3 jbd sha256_generic aes_x86_64 aes_generic cbc dm_crypt sbs sbshc max6650 loop firewire_sbp2 snd_hda_codec_hdmi joydev powernow_k8 hid_generic mperf snd_hda_codec_realtek snd_usb_audio snd_usbmidi_lib snd_seq_midi snd_seq_midi_event snd_rawmidi kvm_amd kvm evdev edac_mce_amd microcode pcspkr edac_core psmouse serio_raw k10temp nouveau snd_hda_intel mxm_wmi snd_hda_codec video i2c_piix4 ttm snd_hwdep drm_kms_helper snd_pcm drm snd_page_alloc snd_seq i2c_algo_bit snd_seq_device i2c_core snd_timer snd soundcore nvidiafb vgastate processor wmi button thermal_sys ext4 crc16 jbd2 mbcache btrfs libcrc32c zlib_deflate dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 mult Sep 16 12:04:47 desktop kernel: ipath linear md_mod usbhid hid firewire_ohci firewire_core r8169 sg sd_mod crc_t10dif ohci_hcd crc32c_intel xhci_hcd crc_itu_t mii ehci_hcd ahci libahci usbcore libata usb_common scsi_mod Sep 16 12:04:47 desktop kernel: [35932.327873] CPU 6 Sep 16 12:04:47 desktop kernel: [35932.327884] Pid: 4607, comm: kwin Tainted: G C O 3.6.0-rc5-drmgit02+ #1 To be filled by O.E.M. To be filled by O.E.M./SABERTOOTH 990FX Sep 16 12:04:47 desktop kernel: [35932.327907] RIP: 0010:[<ffffffffa0498cc5>] [<ffffffffa0498cc5>] nouveau_mm_free+0x85/0x180 [nouveau] Sep 16 12:04:47 desktop kernel: [35932.327943] RSP: 0018:ffff88021db27c28 EFLAGS: 00010246 Sep 16 12:04:47 desktop kernel: [35932.327956] RAX: 0000000000000000 RBX: ffff8802241ad498 RCX: ffff88021dfb08c0 Sep 16 12:04:47 desktop kernel: [35932.327970] RDX: 000000000000000a RSI: dead000000100100 RDI: dead000000200200 Sep 16 12:04:47 desktop kernel: [35932.327985] RBP: ffff880045fff180 R08: 00000000000165a0 R09: ffff88022ed965a0 Sep 16 12:04:47 desktop kernel: [35932.328000] R10: ffffea00016f74c0 R11: ffffffffa0498da9 R12: ffff88021dfb08c0 Sep 16 12:04:47 desktop kernel: [35932.328014] R13: ffff88021db27c60 R14: ffff8802241ad420 R15: ffff88014af16040 Sep 16 12:04:47 desktop kernel: [35932.328030] FS: 00007fba186d0780(0000) GS:ffff88022ed80000(0000) knlGS:00000000f10fbb70 Sep 16 12:04:47 desktop kernel: [35932.328046] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 16 12:04:47 desktop kernel: [35932.328058] CR2: 0000000000000012 CR3: 0000000222fe1000 CR4: 00000000000407e0 Sep 16 12:04:47 desktop kernel: [35932.328080] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Sep 16 12:04:47 desktop kernel: [35932.328097] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Sep 16 12:04:47 desktop kernel: [35932.328115] Process kwin (pid: 4607, threadinfo ffff88021db26000, task ffff88021ca9d100) Sep 16 12:04:47 desktop kernel: [35932.328133] Stack: Sep 16 12:04:47 desktop kernel: [35932.328141] 0000000000000000 ffff88014a07a900 ffff88014a07a9c0 ffff8802241ad498 Sep 16 12:04:47 desktop kernel: [35932.328185] ffff8802241ad400 ffffffffa04ac29f 0000000000008000 ffff88005bdd3e00 Sep 16 12:04:47 desktop kernel: [35932.328215] ffff88014af16000 ffff880221df1178 ffff880221df1580 ffff8802245f7378 Sep 16 12:04:47 desktop kernel: [35932.328237] Call Trace: Sep 16 12:04:47 desktop kernel: [35932.328270] [<ffffffffa04ac29f>] ? nv50_fb_vram_del+0x9f/0xe0 [nouveau] Sep 16 12:04:47 desktop kernel: [35932.328296] [<ffffffffa043f786>] ? ttm_bo_cleanup_memtype_use+0x66/0xa0 [ttm] Sep 16 12:04:47 desktop kernel: [35932.328321] [<ffffffffa044091c>] ? ttm_bo_release+0x1dc/0x220 [ttm] Sep 16 12:04:47 desktop kernel: [35932.328344] [<ffffffffa0440995>] ? ttm_bo_unref+0x35/0x60 [ttm] Sep 16 12:04:47 desktop kernel: [35932.328388] [<ffffffffa05037b2>] ? nouveau_gem_object_del+0x52/0x80 [nouveau] Sep 16 12:04:47 desktop kernel: [35932.328416] [<ffffffffa040ec18>] ? drm_gem_handle_delete+0xd8/0x120 [drm] Sep 16 12:04:47 desktop kernel: [35932.328444] [<ffffffffa040f000>] ? drm_gem_destroy+0x40/0x40 [drm] Sep 16 12:04:47 desktop kernel: [35932.328468] [<ffffffffa040d164>] ? drm_ioctl+0x3c4/0x460 [drm] Sep 16 12:04:47 desktop kernel: [35932.328492] [<ffffffff81302e17>] ? sys_recvfrom+0xf7/0x140 Sep 16 12:04:47 desktop kernel: [35932.328508] [<ffffffff81153a01>] ? do_vfs_ioctl+0x81/0x540 Sep 16 12:04:47 desktop kernel: [35932.328524] [<ffffffff811545eb>] ? poll_select_copy_remaining+0xab/0x120 Sep 16 12:04:47 desktop kernel: [35932.328540] [<ffffffff81153f48>] ? sys_ioctl+0x88/0xa0 Sep 16 12:04:47 desktop kernel: [35932.328555] [<ffffffff8140c8f9>] ? system_call_fastpath+0x16/0x1b Sep 16 12:04:47 desktop kernel: [35932.328569] Code: 89 45 34 8b 41 38 01 45 38 80 79 30 00 75 2b 48 8b 51 10 48 8b 41 18 48 be 00 01 10 00 00 00 ad de 48 bf 00 02 20 00 00 00 ad de <48> 89 42 08 48 89 10 48 89 71 10 48 89 79 18 48 8b 11 48 8b 41 Sep 16 12:04:47 desktop kernel: [35932.328894] RIP [<ffffffffa0498cc5>] nouveau_mm_free+0x85/0x180 [nouveau] Sep 16 12:04:47 desktop kernel: [35932.328930] RSP <ffff88021db27c28> Sep 16 12:04:47 desktop kernel: [35932.328940] CR2: 0000000000000012 Sep 16 12:04:47 desktop kernel: [35932.369824] ---[ end trace e66067c6ec707dbe ]---
Had another GPU lockup with 3.5.3 + Ben's patch, happened after ~2days. This time with some different error messages. Sep 18 11:55:27 desktop kernel: [171976.674871] [drm] nouveau 0000:01:00.0: multiple instances of buffer 134 on validation list Sep 18 11:55:27 desktop kernel: [171976.674906] [drm] nouveau 0000:01:00.0: validate_init Sep 18 11:55:27 desktop kernel: [171976.674910] [drm] nouveau 0000:01:00.0: validate: -22 Sep 18 11:55:27 desktop kernel: [171976.677029] [drm] nouveau 0000:01:00.0: PFIFO: read fault at 0xab00000000 [PT_NOT_PRESENT] from PGRAPH/GPC1/(unknown enum 0x00000008) on channel 0x0000ea8000 Sep 18 11:55:31 desktop kernel: [171980.505052] [drm] nouveau 0000:01:00.0: GPU lockup - switching to software fbcon Sep 18 11:55:34 desktop kernel: [171983.512347] [drm] nouveau 0000:01:00.0: Failed to idle channel 1. Sep 18 11:55:36 desktop kernel: [171985.511208] [drm] nouveau 0000:01:00.0: PFIFO - playlist update failed Sep 18 11:55:39 desktop kernel: [171988.509023] [drm] nouveau 0000:01:00.0: Failed to idle channel 2. Sep 18 11:55:41 desktop kernel: [171990.507779] [drm] nouveau 0000:01:00.0: 0x2634 != chid: 0x00100002 Sep 18 11:55:41 desktop kernel: [171990.507890] [drm] nouveau 0000:01:00.0: PFIFO: unknown status 0x000
This never happened to me before but my computer just crashed on Gentoo Linux running vanilla kernel 3.7.1 with the nouveau driver. I believe this bug report might be similar to the problem I experienced. I don't have the log since I can't find it in /var/log, but I have pictures I took off my monitor. Hopefully this helps. I uploaded the image to imgur http://i.imgur.com/P4hjL.jpg
Created attachment 72200 [details] crash picture, stack trace shows nv50_fb_vram_del at top
The nv50_fb_vram_del kernel crashes are probably fixed by the patch in http://lists.freedesktop.org/archives/nouveau/2013-January/011996.html That issue is probably totally unrelated to the original bug report, though.
(In reply to comment #6) > Unfortunately this is not completely fixed. I am still getting same problem > after 1-2 days of uptime, X crashes and stuck in restart loop - forced to > reboot. > > dmesg: > > > [225619.491763] [drm] nouveau 0000:01:00.0: PFIFO: read fault at > 0x0008028000 [PAGE_NOT_PRESENT] from PFIFO/PFIFO on channel 0x000013a000 > [225623.000984] [drm] nouveau 0000:01:00.0: GPU lockup - switching to > software fbcon > [225626.049326] [drm] nouveau 0000:01:00.0: Failed to idle channel 1. > [225629.047337] [drm] nouveau 0000:01:00.0: Failed to idle channel 2. > [225634.044033] [drm] nouveau 0000:01:00.0: Failed to idle channel 4. > [225637.042033] [drm] nouveau 0000:01:00.0: Failed to idle channel 3. > [225646.703625] [drm] nouveau 0000:01:00.0: Failed to idle channel 1. > [225649.701636] [drm] nouveau 0000:01:00.0: Failed to idle channel 2. > [225659.263372] [drm] nouveau 0000:01:00.0: Failed to idle channel 1. > [225662.261307] [drm] nouveau 0000:01:00.0: Failed to idle channel 2. > [225671.806976] [drm] nouveau 0000:01:00.0: Failed to idle channel 1. > [225674.804998] [drm] nouveau 0000:01:00.0: Failed to idle channel 2. I can reproduce this problem on a desktop NVe7 and kernel 3.10-rc3 with nouveau git changes on top. Once in about one or two days of uptime GPU hangs with a PFIFO read or write fault.
This bug has devolved into "I have various issues with nouveau", so I'm closing it. The original problem that Vlad had appears to be fixed, and the logic not to instantiate bogus copy engines remains in the current code. That's not to say that all problems with nouveau are closed, but bugs have to be about one at a time :) Feel free to open new issues if bugs remain, but please look at the existing bug list and follow http://nouveau.freedesktop.org/wiki/Bugs/.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.