Problem:
After upgrading from 4.10.0-21-generic
to 4.10.0-22-generic
during a routine software update my system would no longer boot. This issue is reproducible on any kernel > 21. The boot screen with the lubuntu logo would not advance and hang, sometimes displaying a black screen.
Details:
I’m running Ubuntu (Lubuntu) 17.04 on a Dell XPS 15 (9560)
$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 17.04 Release: 17.04 Codename: zesty $ uname -a Linux user-XPS-15-9560 4.10.0-24-generic #28-Ubuntu SMP Wed Jun 14 08:14:34 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
Approach:
None of the recovery options resolved my issue (grub -> recovery mode). It’s worth mentioning that I was able to boot the system by entering recovery mode and choosing the resume boot option. This booted the system but all the font sizes too large and the resolution of many applications was way off.
I then looked at the sys logs to find any errors logged during the boot failure. I assumed this was related to the kernel update just installed so I started with /var/log/kern.log
. I use vim because my .vimrc
recognizes the syntax of the log file and highlights errors with a red background color.
$ vim /var/log/kern.log |
The following errors stood out to me:
user-XPS-15-9560 kernel: [ 0.105708] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110a user-XPS-15-9560 kernel: [ 0.105710] mce: [Hardware Error]: TSC 0 ADDR fef1ffc0 MISC 788000c086 user-XPS-15-9560 kernel: [ 0.105712] mce: [Hardware Error]: PROCESSOR 0:906e9 TIME 1497977011 SOCKET 0 APIC 0 microcode 42 user-XPS-15-9560 kernel: [ 0.105714] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 7: ee2000000040110a user-XPS-15-9560 kernel: [ 0.105714] mce: [Hardware Error]: TSC 0 ADDR fef200c0 MISC 388000c086 user-XPS-15-9560 kernel: [ 0.105716] mce: [Hardware Error]: PROCESSOR 0:906e9 TIME 1497977011 SOCKET 0 APIC 0 microcode 42 user-XPS-15-9560 kernel: [ 0.105718] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 8: ee2000000040110a user-XPS-15-9560 kernel: [ 0.105718] mce: [Hardware Error]: TSC 0 ADDR fef1ff40 MISC 788000c086 user-XPS-15-9560 kernel: [ 0.105720] mce: [Hardware Error]: PROCESSOR 0:906e9 TIME 1497977011 SOCKET 0 APIC 0 microcode 42 user-XPS-15-9560 kernel: [ 0.105721] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 9: ee2000000040110a user-XPS-15-9560 kernel: [ 0.105721] mce: [Hardware Error]: TSC 0 ADDR fef1cec0 MISC 4388000c086 user-XPS-15-9560 kernel: [ 0.105723] mce: [Hardware Error]: PROCESSOR 0:906e9 TIME 1497977011 SOCKET 0 APIC 0 microcode 42 user-XPS-15-9560 kernel: [ 2.762096] nouveau 0000:01:00.0: DRM: failed to create kernel channel, -22 |
Looks like the nouveau graphics driver is the root cause of the boot failure.
Solution:
To keep things simple I started by running the Ubuntu “Additional Drivers” application. My system already had the NVIDIA drivers installed but the system was using nouveau instead. After applying the switch from the X.Org X Server nouveau driver to the NVIDIA driver my system was able to boot again. I think this issue might be specific to the my machines hardware.
An old post over at stack overflow suggests you may need to add the graphics repo for nvidia drivers. Looks like my install already included them.
# Optional! This information may be outdated. sudo apt-add-repository ppa:graphics-drivers/ppa sudo apt-get update |
Credit to the folks over at bbs.archlinux.org who discussed this issue and helped me narrow it down.
Edit: (July 28th, 2017): This issue is being tracked by Launchpad Ubuntu/Linux here
Edit: (October 3rd, 2017): This issue is also reproducible when booting live distributions via USB. To resolve this issue add the following boot parameter at startup:
nouveau.modeset=0 |
See the Ubuntu Boot Options Documentation for more information.