Through some rigorous troubleshooting steps on a Ryzen 1700x recently, I have discovered that Ryzen Master was the cause of instability.
I am unable to rule out the suggestion that RAM compatibility might be part of the issue, however many symptoms have been fixed after removing Ryzen Master from the system.
Stuttering/Choppiness during games
Random FPS drops during games
Crashes during games
RM was installed on the system as a tool to monitor temps and voltages and was not intended to overclock. Having a search around, I read that Ryzen Master (once installed) uses a “default” profile. The default profile apparently comes with an overclock (which I was unaware of)
After removing Ryzen Master from the system, games instantly ran as expected. CS:GO no longer randomly stutters or freezes. Overwatch is as smooth as silk.
I would actually go as far as to recommend not using Ryzen Master for anything. At all. Seriously..
I read a somewhere that Ryzen Master was used to overclock the CPU. The issue arose when they tried to reverse the overclock. RM apparently overrides BIOS settings so resetting the BIOS is futile.
After removing Ryzen Master from Windows, they found that the overclock was still active. Flashing the BIOS still didn’t help. The solution was to reinstall Windows as Ryzen Master settings (somewhere) were still changing BIOS settings when Windows has been booted.
Sure, this part of my post is heresay, however after learning of this experience and solving my own issue, I will be sure not to install RM again (even of it is purely for monitoring purposes). If you plan to overclock, go old school!
After wasting a large amount of time on a recent problem detailed on this extensive blog post, I am unhappy about the way AMD drops driver packages.
This post will try to highlight some of the peculiar issues that I have noticed during this endless battle to stop BSOD’s happening to my shiney new Ryzen system.
AMD Drivers -UI
If you have an Intel CPU and an AMD GPU, or an AMD Zen CPU and an Nvidia GPU, you can read on with a smug grin. If you have both an AMD Zen CPU and AMD GPU, join the hair pulling club.
Trying to install AMD GPU and Chipset drivers to the same system can start to make you feel insane, as anyone in an infinite loop condition would do.
Effectively, they (the driver installers) fight for superiority; an AMD driver package you are trying to install checks the driver version against one that is already present on the system, regardless of the type of drivers that you are trying to install.
So for example:
You install AMD GPU driver 19.1.2. After the reboot, you say…
“Hey!, it’s now time to install the chipset drivers while I’m at it!”
…which is not a bad idea at all.
You go to the website and jump through all of AMD’s hoops to pick the right driver and download chipset driver version 18.10.1810.
You start the installer and are confronted with a screen that effectively tells you that you are installing an older driver version, comparing it with the 19.1.2 GPU driver.
This comes down to clarity; which is frankly laziness from AMD. And it doesn’t end there; the GPU drivers have an interface which allows you to find the driver version, which the chipset drivers lack only adding further to the confusion.
Below are some screenshots taken from the AMD site, showing the revision numbers. Yes! they are different, as expected. They are two separate drivers after-all but the installation battle is determined here.
Currently, the way to overcome this is to install the chipset drivers first. The UI will install version 18.10.1810. Installing the GPU drivers after means you will be “upgrading” to 19.1.2 but it doesn’t contain any chipset drivers so the currently installed chipset drivers will be untouched..
Very confusing, AMD.
The BSOD Blame Game
Issues to my system started when the Windows 10 1809 update dropped on my PC. When the problems appeared (many BSODs), I updated to the latest chipset drivers which (as it turned out) seemed to be incompatible with my “older” BIOS. Let’s explore this further.
Windows 10 1809 was officially released November 13, 2018. It wasn’t until late December that Windows Updates decided that it was time to install this hefty upgrade. The upgrade happened as usual and completed without issue.
But something had changed to make my system unstable (did I mention many BSODs?!). The only thing I can think of is a driver update with 1809 or a change relating to the kernel which meant that something isn’t working correctly. This forced me to look at updating to the latest drivers to be sure.
Alas, the latest drivers did not solve the problem and only made the issue worse. Windows 10 was the cause.
AMD – Drivers
For the time being, I’m ignoring AMD’s GPU driver roulette…
… and concentrating on chipset drivers. I think AMD likes to assume that you not only have the latest AGESA BIOS readily available for your motherboard, but also applied.
As it turns out, the latest chipset driver package doesn’t specify any patch notes, AGESA or pre-configuration requirements and it doesn’t specify exactly what is included in the package.
This most recent version could be AMD’s response to the new W10 1809 version! How Ironic.
If only AMD could tell me that this was only compatible with the most recent AGESA BIOS and that if it wasn’t available, I could try an alternative driver.
Not afraid of getting my hands dirty, let’s check for a BIOS update!
GIGABYTE – BIOS
Currently, the timeline is around 2.5 months since Windows 10 1809 was released. Had I decided to update on release day, I would have had to wait through 2 months worth of BSODs until I found a solution. I finally found the solution on Gigabyte’s website.
The new BIOS (F25) is just over a week into its’ release (currently 25/01/2019). As you can clearly see from the description, it requires the latest chipset drivers that AMD released 26/10/2018. That could mean that the chipset drivers relies heavily on the F25 BIOS. Maybe.
After looking for information about AGESA 126.96.36.199, Google is littered by speculation as far back as May 2017. So I ask myself, how bad is Gigabytes’ hardware support? and the answer is obvious, it seems*. So essentially, Gigabyte could have sat on AGESA 188.8.131.52 as early as 2017.
It’s not as clear cut as this, though. There are different AGESA versions depending on the type of chip, ZEN, ZEN+ and ThreadRipper to name those that I dug up.
*not so much obvious, actually. AGESA is incredibly badly documented for the most part, and it seems only high level slithers of the contents dribble down to consumers such as “better memory support” etc.
I am happy to conclude that the issues (since the BIOS update) have been resolved and it was most definitely a software issue.
Although only a couple of weeks of constant BSOD’s interrupted my ability to use the computer, I feel it’s still wholly unfair that we have to deal with these issues at a time when updates to security are as important as it has ever been.
If indeed Gigabyte has been sitting on this update for a while, it is totally unacceptable that these things should be left beyond the last minute.
On the same note, AMD need to provide greater clarity about the contents and pre-requirements to their drivers and stop blanketing their customers with just another iteration of the version number. Incredibly unhelpful.
My theory now is: due to the sheer amount of different BSODs errors that all pointed to memory faults, it could only have been a collaborated effort from AMD and Microsoft to help mitigate against Spectre on the AMD platform. But since the BIOS didn’t know about the changes, the drivers threw up errors.
Recently, I’ve been experiencing many BSODs in Windows. I’ve had a few different errors such like “KMode_Exception_Not_Handled” and “TCPIP.sys” which ultimately threw up Kernel Power errors in Event Viewer.
After a few searches, the errors pointed to driver issues. This started to happen soon after upgrading to the latest Windows 10 version.
Starting with the network driver, downloaded the package from the motherboard’s site and installed it, but the BSODs carried on happening. I then decided to reinstall both graphics drivers and chipset drivers from the AMD site. Alas, the BSODs persisted.
I decided to go down the “Old School” route by uninstalling the motherboard, AMD GPU and AMD Chipset drivers completely. I then used CCleaner to clear the registry and deleted the AMD folder located in C:\AMD.
Fully cleaned of old drivers, I installed all motherboard drivers, and then installed AMD Ryzen Chipset drivers BEFORE finally installing the AMD GPU drivers.
So far, after a few reboots and some good hours of usage, the system seems to be behaving itself! Until I turned it on the next day and I was getting BSOD after BSOD.
The drivers weren’t the problem.
At this point, there wasn’t much more I could do more in regards to the drivers. Clearly, there was an issue somewhere else and I’ve exhausted the “easy” options so far. A lot of the errors seem to point loosely to perhaps bad RAM corrupting the drivers or the filesystem.
Going back to basics, I tested the system.
CHKDSK on drives – no issues
MEMTEST86+ – 8 passes no issues
Windows Shell “SFC” scan – no issues
Windows Memory Diagnostic test – 1 pass no issues
Reseated the RAM
Reseated the GPU
Stress test system with 3DMark – 1 BSOD, 1 PASS
Disabled some devices like GPU audio output and onboard sound in case of conflict
At this point, I had a few things to think about. Overwealmingly, most of the tests had passed.
Memory was good
Storage was good
Windows installation was good (apparently)
Which lead me to believe the possibility of these conclusions:
Bad GPU – BSOD ATI related errors, faulty hardware?
Bad Motherboard – BSOD memory-related errors?
Bad CPU – BSOD memory-related errors?
Bad PSU – Event Veiwer Kernel Power errors?
Dodgy Windows update – corruption?
Whilst pondering these grim posibilities, I checked the drivers again on the motherboard’s website in hope of a new driver release which may solve my issues. Almost a week prior, there had been a new BIOS update released.
That’s when the penny dropped; the newest chipset drivers “might” not be working properly with the older motherboard firmware! This was another completely reasonable notion that hadn’t occurred to me since the release date on the BIOS was only a couple of days ago but I’ve been having this issue for a couple of weeks. The morbid conclusion of a hardware failure (although, not impossible), had now left my mind, and was sure that this was the cause.
Before any BIOS update, reset back to default configurations. I updated the BIOS, rebooted and maticulously went through the options to roughly gain my previous configurations. Booted back into Windows and no BSOD (yet).
I decided to do another 3DMark stress test, just to give it the computer something to worry about. It went to 1 point above the last test.
A couple of restarts and hours of usage after, no sign of any issues. I re-enabled the devices that I had previously disabled and carried on to use the computer normally.
The system now seems solid, and not an error in sight yet. This is positive and I am confident that the new BIOS has fixed the stability issues
To conclude, the newer drivers didn’t play nice with the older firmware, and the new BIOS seems to have solved the problem. But this highlights some other concerns…