AMD Bulldozer platform discussion

Started by Jeremy Collake, December 08, 2011, 12:12:40 AM

Previous topic - Next topic

Jeremy Collake

I have been actively testing the new AMD Bulldozer architecture, and have been testing TurboCore since the AMD 1055T. I am currently testing the AMD 8150. I am working on new technologies, but can't say more at this time. The scheduler does indeed have a deficiency when it comes to dealing with overclocked processors, in that it has no awareness of them. I suspect this will change since Intel also has its own Turbo Core type technology called Turbo Boost. You can bet that Intel will insist the scheduler become more aware of them ASAP. MS is more likely to pay attention to Intel's insistence than AMD's. However, scheduler changes must be carefully tested, so whether this shows up in a further Win7 SP, or Win8, I don't know... if it ever shows up.

My general recommendation for those enthusiast who overclock their gaming rigs and such is to turn off Turbo Core / Turbo Boost, and then OC as far as you can while keeping a stable system. Of course, I recommend NOBODY overclock their system, but for those who already are taking this risk and know what you are doing, that is the general recommendation.

As a side note, I have determined that some MB utilities that claim to TurboCore even further on AMD processors DO lead to system instability in high loads. It took me a long time before I could definitely say that, but it is true.

P.S.
Let us hope we don't have to wait for Win8, as I've so far been underwhelmed by it, and can only hope the whole touch screen thing can be completely disabled and the legacy Start Menu is brought back. Surely to goodness MS is not betting on the PC going anywhere. I mean, try for yourself to reach up and touch your LCD, it is a pain in the butt. I'll take a mouse any day. I can understand them wanting to make Win8 work seamlessly on touch screens, but they must preserve full legacy capabilities. Tablets have their place, but the PC remains the most productive device by far, and productivity is where the money is at (contrary to popular consumer opinion). There is nothing I can't do 10x faster on a PC than a tablet, and a lot that is still (and will always be) cumbersome or impractical to do on a tablet.
Software Engineer. Bitsum LLC.

Jeremy Collake

#1
... some posts deleted to be on-topic ...

However, as for scheduler improvements, your idea of how video drivers adapt themselves to various games actually would be a good idea, with a slight deviation I'll throw in. Instead of (or in addition to) using a database of applications and such, a general scheduler replacement and/or plug-in interface could be developed. That way, the scheduler itself could be replaced per CPU model. So, a new CPU comes out, and a scheduler driver update comes with it. The problem is, of course, that there is considerable complexity to replacing, or even complimenting via a plug-in, a component this essential to the basic functioning of the OS. In fact, that's why Microsoft doesn't update it so quickly... updates to it must be carefully tested and there are issues with backwards compatibility. It is an interesting concept though, a more dynamic scheduler that isn't a simple drone always doing the same thing.

Linux could actually do this quicker and easier than Microsoft, being an open source system where the CPU scheduler is already swapped out by advanced users at times. It would be interesting to see benchmarks of CPU performance done on Linux using schedulers provided by the CPU manufacturers.
Software Engineer. Bitsum LLC.

Jeremy Collake

An off-shoot of this idea is to allow for benchmarking of new processors using custom CPU schedulers in linux. This could allow us to see the theoretical gains that might be made were such modifications to the Windows CPU scheduler made. See https://plus.google.com/111452122533164797807/posts/1P8rYjuYvvw for more information.
Software Engineer. Bitsum LLC.

edkiefer

Quote from: bitsum.support on December 08, 2011, 08:57:27 PM
That's quite true. I was mostly joking, as there is indeed no chance Microsoft would include this in Windows.

However, as for scheduler improvements, your idea of how video drivers adapt themselves to various games actually would be a good idea, with a slight deviation I'll throw in. Instead of (or in addition to) using a database of applications and such, a general scheduler replacement and/or plug-in interface could be developed. That way, the scheduler itself could be replaced per CPU model. So, a new CPU comes out, and a scheduler driver update comes with it. The problem is, of course, that there is considerable complexity to replacing, or even complimenting via a plug-in, a component this essential to the basic functioning of the OS. In fact, that's why Microsoft doesn't update it so quickly... updates to it must be carefully tested and there are issues with backwards compatibility. It is an interesting concept though, a more dynamic scheduler that isn't a simple drone always doing the same thing.

Linux could actually do this quicker and easier than Microsoft, being an open source system where the CPU scheduler is already swapped out by advanced users at times. It would be interesting to see benchmarks of CPU performance done on Linux using schedulers provided by the CPU manufacturers.
Yes, I like the CPU plug-in idea , for you (PL ) I could see it working (that is why i mention idea :)  ). Only thing would be how hard to implement into windows w/o to much resources being used, overhead . Probably not bad as PL does good job .

BTW : little OT but i was looking at the NT TweakScheduler settings in PL , I am pretty sure I never messed with them but I compared setting you have on documentation and mine is different on last option . I have 3x on"give foreground triple length interval" .
Is that right for default on XP SP3 , I think the issue is it is different depending on OS . Maybe put what defaults should be somewhere as reference .
Bitsum QA Engineer

Jeremy Collake

Quote
BTW : little OT but i was looking at the NT TweakScheduler settings in PL , I am pretty sure I never messed with them but I compared setting you have on documentation and mine is different on last option . I have 3x on"give foreground triple length interval" .
Is that right for default on XP SP3 , I think the issue is it is different depending on OS . Maybe put what defaults should be somewhere as reference .

My documentation is certainly needing an update in that regard, I'll take a look at it -- I have neglected the docs for the last while.

The defaults change only between the Server and the Workstation (standard) editions of Windows. It should also be noted the same setting can be changed in the Advanced System (where you set the page files) interface of Windows. It is the option to dedicate more CPU resources to foreground or background tasks. For workstation editions of Windows it defaults to foreground, and for Server editions of Windows it defaults to background.
Software Engineer. Bitsum LLC.

edkiefer

ok, then it is ok as I have both options to program in advanced tab of performance in windows menu .
Bitsum QA Engineer

edkiefer

you mean the dynamic OC of CPU clocks . Is it so much difference than Intel , since Intel has been doing this for a while now .

I was always wonder how efficient it is to use software OC , meaning systems boots at X speed and within windows you now boost it .

So I would guess OS is not optimized properly doing this compared to setting up in bios the speed you want, then window sees this at boot and while it was installed too .

So I imagine Intel has some kind of driver/dll to handle its turbo boost , there must be all kinds of timing issues to work out i would think .
Bitsum QA Engineer

Jeremy Collake

#7
The details are unknown to us really. It may be at a lower level. However, the notion of suddenly boosting a handful of random cores up in clock speed, without the scheduler being aware of this change, is simply troublesome in itself. The scheduler might shift time sensitive code to a slower core, then back to a faster core. It's all so random, it just bothers me. Frequency scaling for power consumption, as has been done for a decade or so, at least scales the entire CPU (all cores), not individual ones.

It is not much different from Intel's, no.. but the architecture of the processors is different, the behavior of the scheduler may be different, so the end effect may be different. UPDATE: Indeed, taking a look at the architecture, each pair of cores in Bulldozer shares an FPU, an L2 cache, and more. They are not 100% self-sufficient cores.

Honestly, in my testing, Intel's biggest advantage is HyperThreading, and the Windows scheduler using that HyperThreading pretty well (e.g. not putting too much strain on those 'fake' cores that only have maybe 10% of the computing capacity of a real CPU). This really makes for a very responsive system. I had dreaded making this statement, but every test I run shows that HyperThreading helps a lot.
Software Engineer. Bitsum LLC.

edkiefer

Yes, I have read Intel has been improving HyperThreading, tech both on CPU and in OS .

I think the main push for this is with so many cores which many apps don't even take into consideration there trying to power down sections/cores to save power/wattage .

Probably very important on laptop (battery operated system) but on desktop not so much , then again in server use it might add up to some decent savings .
Bitsum QA Engineer

Jeremy Collake

Read http://vr-zone.com/articles/microsoft-comes-to-amd-bulldozer-rescue-windows-update-speeds-up-things/14256.html for more information about just how Microsoft plans to go about this fix, according to that article anyway. It intends to treat the extraneous processors essentially as if they are HyperThreaded cores, and thus avoid them. It will look at your CPU as a 4 core processors with 8 logical cores. While this might not sound ideal, it would be an improvement, but is an incomplete solution.

Another temporary solution is to use Process Lasso to set a default affinity of a max of 4 cores for more optimal processing of some applications.
Software Engineer. Bitsum LLC.

edkiefer

I am surprised there only saying it improves by single digit % . I was sure even Intels HT gets double digit % increase .
Bitsum QA Engineer

edkiefer

Bitsum QA Engineer

Jeremy Collake

Nice, thanks for the info ;). I'll be experimenting ASAP.
Software Engineer. Bitsum LLC.

Jeremy Collake

#13
This update DOES change the way the Windows sees your CPU, and can cause Process Lasso to deactivate under some scenarios. This is because it changes the CPU Scheduler's recognition of these AMD processors from (for example) 8 real cores to 4 real cores with 4 hyper-threaded (fake) cores. This is essentially a cheap hack to get performance on par and make use of TurboBoost and the intrinsic characteristics of the Bulldozer architecture (e.g. shared L2 cache per module of 2 cores) by telling the scheduler to keep its load on no more than 1/2 (every other) of the processors, if it can - and keep any large load off the other cores, if it can.

For example, before this update the below would read "[8 cores: 8 logical]". Meaning, 8 physical (real) cores, 8 total cores.



After applying the patch, as the reader can see, the OS thinks there are only 4 real cores, and 4 hyper-threaded (fake) cores. In this way, it is an easy and quick 'hack' to get the Scheduler to play nice with the Bulldozer platform. They simply used the pre-existing support for HyperThreaded CPUs.

This will not hurt performance, it will help it. The other cores will get fully used under highly threaded situations, they are just avoided.

Lastly, this update causes every other core (the ones it considers 'fake') to be parked until needed, so it may conserve energy as well.
Software Engineer. Bitsum LLC.

edkiefer

So basically there just trying to make OS think its a Intel with HT but that means the so called virtual cores will get used less .
kind of defeats getting 8 core processor . But I guess it does fix issues they had .
Bitsum QA Engineer

DeadHead

Quote from: edkiefer on January 13, 2012, 08:44:54 AM
So basically there just trying to make OS think its a Intel with HT but that means the so called virtual cores will get used less .
kind of defeats getting 8 core processor . But I guess it does fix issues they had .

I both agree and disagree with this. Of course depending on usage, I do think that the way MS implemented this was in the best way possible - at this moment. When it comes to the applications that actually use several cores (quite a few comes to mind), this will allow full use of all eight cores. I can't see any better way for the scheduler to handle this new processor. Well, they could have (for that oh so important physcological effect!) skipped the idea to call it "4 cores: 8 logical" and instead just called them 8 physical cores, and just used core parking on every second core.
Windows 10 Pro 64 (swedish) || Xeon 5650 @ +4 GHz || 24 gig ram || R9280 Toxic

Jeremy Collake

#16
If I were in Microsoft's shoes, I'd have likely done the same thing. This is indeed the best way to handle it without having to wait 6 months for a bunch of extensive new refactoring and regression testing of a critical part of the OS. Better to have done this than nothing, though some consumers are going to be a bit confused. Indeed, given the architecture of Bulldozer, I am starting to wonder if it isn't AMD's way of introducing HyperThreading without it being HyperThreading. Sure, they are full cores, but each pair share an L2 cache, share an FPU, and more .. making them 'Bulldozer modules', as AMD calls them.

When you read more about the Bulldozer architecture, e.g. here, it also describes the similarities (and differences) between a Bulldozer paired module of cores and an HT core: http://en.wikipedia.org/wiki/Bulldozer_(microarchitecture)

Software Engineer. Bitsum LLC.

Jeremy Collake

#17
Bulldozer Architecture from Wikipedia. Notice, in addition to benefiting from TurboCore by trying to keep things on every other CPU, the architecture of the Bulldozer platform is such that cores are paired together, with each pair sharing an L2, FPU, and other items. You definitely want to keep the load on every other module as long as possible. The below is an 8-way CPU, not 4 way. It is 4 pairs of cores.



UPDATE: Now, imagine arguing in court about whether this is HyperThreading or not? AMD seems in the clear, and now they can start sharing their computation resources, perhaps being the place they save money per-core - and the purpose for this change. To end users, the total core count is what is often seen, and AMD surely wants as many cores as they can get -- but can't step on the shoes of Intel's patented HyperThreading.
Software Engineer. Bitsum LLC.

DeadHead

I'm a bit curious how this approach with this new scheduler compares the new Bulldozer to the older Phenom II X6 1090T in terms of performance.
Windows 10 Pro 64 (swedish) || Xeon 5650 @ +4 GHz || 24 gig ram || R9280 Toxic

edkiefer

I don't know about anyone else but what seems so odd to me is this new CPU architecture must of been in development for many yrs and it seems they waited for release to get the software end to work on OS's . Surely they could of worked with MS while it was in alpha, beta state just seems weird .
Bitsum QA Engineer

Jeremy Collake

Quote from: DeadHead on January 14, 2012, 02:07:45 AM
I'm a bit curious how this approach with this new scheduler compares the new Bulldozer to the older Phenom II X6 1090T in terms of performance.

I didn't ever own the more expensive 1090T, but do have the cheaper little brother, the 1055T, and was going to run some tests. Sadly, the clock speed is a bit low for a comparison. I can tell you that after moving from a 1055T to a 8150, I sadly did not experience any perceived build speed improvements. There may have been a speed improvement, but not enough for me to notice. I still end up waiting forever for all of Process Lasso's editions to build, and for Process Lasso the CPU is the primary bottleneck. During about 85% of the build, all cores are maxed out. The remainder are single threaded operations.

The hotfix didn't apply to my 'highly threaded' scenario, but I suspect that *without* the hotfix, an average user would have been better off with the 1090T or 1055T. With the hotfix, the new processor should be a bit better, but it sure is no giant leap forward.
Software Engineer. Bitsum LLC.

edkiefer

Bitsum QA Engineer

Jeremy Collake

Thanks Ed, as always. I would like to quote their Final Words:

Quote"AMD didn't overpromise as far as the benefits of these new scheduling/core parking hotfixes for Windows 7 are concerned. Single digit percentage gains can be expected in most mixed workloads, although there's a chance that you'd see low double digit gains if the conditions are right. It's important to note that the hotfixes for Windows 7 aren't ideal either. They simply force threads to be scheduled on empty modules first rather than idle cores on occupied modules. To properly utilize Bulldozer's architecture we'd need a scheduler that schedules both based on available cores/modules but biases its scheduling depending on data dependency between threads."
Source: http://www.anandtech.com/show/5448/the-bulldozer-scheduling-patch-tested/4
Software Engineer. Bitsum LLC.

edkiefer

Yes, he goes into detail whats going on with the cores and set of cores . In end the beta Win8 also does better by few % too .

All that said if you compare dual, quad etc in Intels line up they seem to scale much better but that might just be the architect of the CPU itself and not so much a OS, schedule thing .

Edit: I hate to get to far off topic but I made a statement and wanted to check it .

found this site tested scaling of AMD and Intel chips in Linux , not in windows but still valid test, maybe better not being windows   ;D

http://www.phoronix.com/scan.php?page=article&item=amd_bulldozer_scaling&num=1
Bitsum QA Engineer

Jeremy Collake

Thanks again, Ed. I figured someone would run some benchmarks in Linux, something I've advocated since this whole issue came up. The scheduler is so easily replaced, you can even run multiple tests with different schedulers to find out which is the most optimal. It is relevant because it shows the potential of the hardware, *if* the software is correct. In other words, it is relevant to give Microsoft a kick in the pants to make sure their scheduler is up to par. That said, they did issue these AMD Bulldozer hotfixes very quickly. Sure, they took a quick shortcut, but they made a big difference with it.

The one thing I wonder is if the updates have been pushed out yet via Windows Update, or if they will be. I would imagine they will.
Software Engineer. Bitsum LLC.

edkiefer

I would assume once the hotfix is tested enough and brings no issues they will have it available under windows update .
Bitsum QA Engineer

Jeremy Collake

Yes, most likely, though we won't really know until it happens, if it has not already. Sometimes corporations don't follow the most intuitive of rational logic ;p.
Software Engineer. Bitsum LLC.

edkiefer

Bitsum QA Engineer

Jeremy Collake

Yea, I saw that. Even commented, if it wasn't removed. As ThreadRacer is a tool I thought users of Bulldozer might be interested in. While the visual effects are not done, the basic functionality is.
Software Engineer. Bitsum LLC.

edkiefer

#29
I just checked, yes your comments are still there .

I don't think they were when I read it, or I just missed it , if I look at comments I generally only go down a few  . Interesting results anyway .
Bitsum QA Engineer