ThreadRacer pre-release teaser

Started by Jeremy Collake, May 26, 2012, 08:06:00 AM

Previous topic - Next topic

Jeremy Collake

ThreadRacer Demo App and Early Preview of New Controls (screenshot #1)

I'm pulling it all together and debugging. There is more work to do, but I wanted to show you what I had in mind for ThreadRacer, the new demo application that lets you benchmark cores against each other. Consider, this CPU is an AMD Bulldozer platform, an architecture that AMD has permanently adopted (with incremental improvements). Much like HyperThreading, except the logical cores are more powerful, pairs of core modules share some computational units. That means when both are active (e.g. core 0 and core 1, neither can perform as fast a single core without its 'paired core' active). Notice the difference between core 0,1 and core 2 performance.

The group controls you see here are ones that will be overlaid over the main new Process Lasso graph. They support any number of cores and can be of any size. In other words, pure new controls written from scratch. I still have some debugging work to do, which is why the % of max core freq isn't YET shown, but will be soon. 64 cores? NO PROBLEM - its just that each of those progress bars would be a lot thinner ;)

Note: In this screenshot the processor is incorrectly shown as 'HyperThreaded' when it is really AMD Bulldozer's paired core design. This will change quickly as I add more specific processor information. The reason this is stated is because the Windows 7 scheduler changes for Bulldozer treat these AMD processors (and future ones based on the next Bulldozer-like architecture) as HyperThreaded cores - a quick hack to improve performance.

New/Improved User Context Resolution + Service Integration (screenshot #2)

I've improved upon the user name resolution code, though this does increase complexity of the rules. Anyway, now system processes are sub-divided into specific system user contexts with their individual levels of security. This is good for accuracy, bad for ease of use (though I've done some things to help with that). This also works on PCs where it previously failed to due service configuration issues (either WMI or TS/RDS disabled).

Don't forget, as revealed before, service names, now included in brackets, are also visible in the process view. In the case of multiple services in a single process instance, multiple services will be shown.
Software Engineer. Bitsum LLC.

Jeremy Collake

And I'll offer some downloads of the prototype. There's an issue with the % of max frequency code that I'm still working to resolve, and it'll work very soon. One thing at a time ;).

32-bit: http://bitsum.com/files/threadracerdist32.exe
64-bit: http://bitsum.com/files/threadracerdist64.exe

WARNING: The core measurements likely will not work in non-English languages. This will be resolved with a simple code change, this isn't the final rendition. The % of max frequency won't work anywhere right now. BUT, there are plenty of tools to show you these things in the interim.
Software Engineer. Bitsum LLC.

edkiefer

Is this suppose to replace CPU eater demo ?

Feedback after you run it one time the bar graphs should reset or there should be a refresh button , other than that and maybe a bit of documentation (what difference between top 1,2 threads vers bottom multi thread 1) . I know the multi-thread splits workload among core .

Looks good so far for demo .

No slowdown with all cores 100% with PL running .
Bitsum QA Engineer

Jeremy Collake

#3
No, its a different type of meter, it is not a replacement. I will explain more in time. Thanks for the feedback, indeed I forgot to reset those graphs. It is/was more a functional test of some new components, and demonstrates the effect of what both AMD and Intel use now, which is 'paired' logical processors. Intel, of course, is using a much more extreme version of it, while AMD is using the more 'two real processors, just sharing some stuff' approach. I will demonstrate the effects of this, and why avoidance of having your thread scheduled on the wrong core can make a big impact. In certain situations, primarily single-threaded ones, in controlled situations or when there is only a single primary CPU consumer, it may be much more efficient to force it to do all its operations on one real core. The normal scheduler would likely do this anyway, but it would at times offload it to another core, and cause core thrashing (invalidation of the L1/L2 cache).
Software Engineer. Bitsum LLC.

edkiefer

#4
ok,  I only have a core duel 2.13 so no HT, just 2 cores . So I am probably not a good candidate to show issues with thrashing threads .
When I enable the multi-core thread 1 I get pretty even % across both CPU (pretty close to 50/50% with total adding upto 1 core @100% )
Bitsum QA Engineer

Jeremy Collake

Yep, in your case it should be equal, except if one core gets busy with something else maybe. That's the idea, to show the effect of the new AMD Bulldozer platform's processor 'modules' (which AMD will be basing future processors on), and contrast that with Intel's HyperThreading... or the 'old school' actual fully independent processors on each core.

I need to update that screenshot so it doesn't show 'HyperThreaded' for a Bulldozer platform, as on a HyperThreaded CPU the difference would be even greater. As you can see though, for a Bulldozer platform, if one of the pairs of cores is busy, it substantially affects the performance of the other core. Thus, awareness of this effect by both the system scheduler, and (to a lesser extent) the user is important.

I also updated the build to reset the progress bars when you start a new test, FWIW.
Software Engineer. Bitsum LLC.

Jeremy Collake

Oh, another *very important* aspect to ThreadRacer is showing how the scheduler handles a single CPU consuming thread (as you can test in the multi-core thread). You can see how it swaps the thread around (core thrashing), or doesn't. The less it moves the thread to other cores, the better the performance will be, assuming nothing else is going on. This is because the L1 and L2 cache contents are lost when the thread is swapped to a different core, as each core has an independent L1 and L2 cache.
Software Engineer. Bitsum LLC.

edkiefer

ok, bottom CPU core graph resets when ending test and top progress thread iterations get reset at start of new test .

I assume the "percent of max CPU frequency " is only for turbo boost type CPU or at least ones that lower and raise frequency dynamically ?
Bitsum QA Engineer

Jeremy Collake

Quote from: edkiefer on May 30, 2012, 08:50:40 AM
I assume the "percent of max CPU frequency " is only for turbo boost type CPU or at least ones that lower and raise frequency dynamically ?

Right.

Software Engineer. Bitsum LLC.

TfH

Man this is sweet benchmarking tool! I don't know how I have managed to miss this post o_O
( When it's time to get new CPU it's definitely 8-Core or more as this my current 6-Core seems so "lame" now LOL xD )





EDIT:
For Finnish language Set affinity button is too small, if you are wondering why that cap is in English.

edkiefer

I know you were kidding with 6 core being lame . The big issue and one that really i don't see getting fixed is very few app can be multi-threaded and even when they can its hard to keep all threads full of data to run 100% . Of course if you multi-task then you get benefit  depending on how many apps you running .

IMO quad core with maybe HT is more than enough to handle most apps users would use and multi-task too .
Once you get into 6-8+ its going to be hard keeping them running full speed .
Bitsum QA Engineer

TfH

Yeah, I know. It's hard to find app what is (properly)optimized for multi-core. There is lot of tinkering in settings and surfing through different benchmarking tools when you (and if you) are interested to see how it "loads" each core near evenly. For example latest GTA (thats just so wrongly coded [from console base], basically no optimization at all), there were really non difference on CPU on my main and secondary rig (it has 4-core CPU) even the box says it's multi-core "optimized", basically 0 and 1 took all tasks and rest tried to follow. But on Duke Nukem Forever there was huge difference. I used same "average" settings on primary and on secondary rig. But thats not completely reliable test as no matter what GPU still has something to do and on my primary I have 69xxHD series GPU and secondary 59xxHD series GPU.

edkiefer

right, there not many something like  photoshop , Video Transcoding , or if your dev compiling .

I think most games for example have hard time with 4 cores (keeping them able to run full ) .

I would like to see Intel keep shooting for more mhz but at same time keep pipeline short , I don't see a place really for like 16 cores for desktop user , but maybe that will change in time but think it will be slow .

IMO I would stay with Intel based cpu right now, I don't really like what AMD did here now with Bulldozer .
Bitsum QA Engineer

Jeremy Collake

One thing to remember in your discussions is that the biggest problem with Bulldozer (and its descendants) is currently the Windows scheduler. Thus, at it improves to *properly* handle them, instead of this quick hack to treat them like hyper-threaded processors, it will improve their performance. That said, until it happens, it is just a pipe dream.
Software Engineer. Bitsum LLC.

TfH

Quote from: edkiefer on June 04, 2012, 06:51:52 PM
right, there not many something like  photoshop , Video Transcoding , or if your dev compiling .
Heh, it's like you have read my mind. All three apply to me, except I use Corel PaintShop Pro X4 Ultimate instead of Photoshop :-)

I've always been AMD/ATI fanboy. Maybe it's because long ago I got really cool custom AMD T-shirt (sadly, it's no longer with us R.I.P :( )

So sorry for posting this totally OT pic, but I feel like kinda boasting as I've never been this close to optimal score in WEI (Windows Experience Index). Optimal is 7,9:



EDIT:
@bitsum.support yeah, you are absolutely right. It is so easy to go "side tracks" on this.