Everywhere Threads This forum This thread Search titles only NoteBy:SearchAdvanced search… Everywhere Threads This forum This thread Search titles only By:SearchAdvanced…Log inRegisterWhat's new
Search
Everywhere Threads This forum This thread Search titles only NoteBy:SearchAdvanced search… Everywhere Threads This forum This thread Search titles only By:SearchAdvanced…Toggle sidebarToggle sidebar Menu Install the app Install How to install the app on iOS
Follow along with the video below to see how to install our site as a web app on your home screen.
Note: This feature may not be available in some browsers.
Thank you all so much for sharing your 2025 with us. Have a great holiday season and a Merry Christmas to all who celebrate!
Twas the night before Tom's Christmas!
Why does GTX 1050 out perform GTX 660?
Thread starterThread starter ghost47
Start dateStart date Nov 16, 2018
Toggle sidebarToggle sidebar
Home
Forums
Hardware
Graphics Cards
You are using an out of date browser. It may not display this or other websites correctly.You should upgrade or use an alternative browser. G
ghost47
Honorable
Dec 4, 2013 4 0 10,510 I saw the spec sheet on TechPowerUp (links at the end), and the 660 has better Memory Bus, Bandwidth, Texture Rate, more GFlops (or TFlops), Shading Units, and TMUs. The only thing better in 1050 is its Pixel Rate, ROPs and Clock Speeds. And obviously its processor size is smaller (better) and hence more transistors. What makes 1050 so good that it can outperform not only 660 but also 760(2GBs), despite having lower specs on paper. Is it, that clock speeds,pixel rates and number of transistors only matters? How should I judge a graphics card on paper? PS: Outperform in terms of fps. GTX 1050 specs: https://www.techpowerup.com/gpu-specs/evga-gtx-1050-sc-acx-2-0.b3900 GTX 660 specs: https://www.techpowerup.com/gpu-specs/evga-gtx-660.b1428 Solution GTX660 is Kepler architecture. Kepler has 192 cores pers SMX which can't use more than 128 of them efficiently if CUDA kernel is not optimized much. Also each SMX has only 200 GB/s bandwidth. GTX1050 is Pascal architecture. Pascal has 128 cores per SMX with better SMX design so that all cores are much much more better utilized with same CUDA code. It has much higher shared memory bandwidth per SMX. Since both GPUs have 5 SMX units and Pascal is better, GTX1050 is a better GPU. Do you know that in first versions of quake games there was a software calculation for "inverse-square-root" on CPU? Inverse square root or just square root are important for gaming and rendering. When you use them, you better have more of them. Lets... Sort by date Sort by votes
Tugrul_512bit
Distinguished
Nov 19, 2013 43 6 18,545 GTX660 is Kepler architecture. Kepler has 192 cores pers SMX which can't use more than 128 of them efficiently if CUDA kernel is not optimized much. Also each SMX has only 200 GB/s bandwidth. GTX1050 is Pascal architecture. Pascal has 128 cores per SMX with better SMX design so that all cores are much much more better utilized with same CUDA code. It has much higher shared memory bandwidth per SMX. Since both GPUs have 5 SMX units and Pascal is better, GTX1050 is a better GPU. Do you know that in first versions of quake games there was a software calculation for "inverse-square-root" on CPU? Inverse square root or just square root are important for gaming and rendering. When you use them, you better have more of them. Lets compare GTX660 and GTX1050 now: GTX660: has 32 SFU units (that can do inverse square root) per SMX. GTX1050: has 32 SFU units (.....) with lower latency AND higher frequency. Maybe 100ish equivalent. So, for any optimized code(gaming or not), GTX1050's in-GPU fast memory and CUDA cores and all graphics related pipelines are much more superior than GTX660. For any unoptimized code, GDDR memory bandwidth becomes really important so GTX660 can have some advantage here. Also I think GTX1050's bigger cache is relaxing this gap so GTX660 loses again. GTX1050: - much more inverse-square-root throughput per CUDA core. - higher fast memory bandwidth per CUDA core. - better utilization of all CUDA cores instead of just 2/3 of them as in GTX660 - 2x GPU frequency to double everything above so that each SMX unit is like 256 CUDA cores equivalent now, with 2x square roots and more bandwidth - software optimizations for newest architecture (everyone knows this but not really much important) - ofcourse pixel rate will let you make more FPS when other parts are enough to keep up with the pace - If you compare processor size, do it with nanometer scaling: 14nm for pascal, 28nm for kepler so looks like 4x more transistors can fit on same area but ofcourse not simple as this, there are other things that doesn't let you fit 4x much but enough to surpass the GTX660 - pixel compression so that memory bandwidth or bitness is more irrelevant now Please compare these: - how much special function units (SFU) per cuda core: 1/4 for Pascal, 1/6 for Kepler, 1/8 for older - how much bandwidth per cuda core: much higher in Pascal (just 33% more from number of cores per smx, 100% more from frequency, +x from architecture) - texturing performance (40 TMU at 2GHz with much better architecture vs 80TMU at 1GHz) - gflops: 1800GFLOPS vs 1900GFLOPS, not much different besides GTX1050 can utilize at least 10-20 % more of its own peak value than the GTX660 does. - so that achieved gflops is 1500 GFLOPS vs 1000 GFLOPS - technologies: kepler can't overlap multiple work efficiently. Pascal has hyper-q technology to do multi tasking efficiently per smx unit. - technologies: pixel compression of gtx 1050 - technologies: dynamic parallelism of gtx 1050 - driver updates per month - best way: benchmarking https://gpu.userbenchmark.com/Compare/Nvidia-GTX-660-vs-Nvidia-GTX-1050/2162vs3650 because benchmarking is the best apples vs oranges comparison More compute-related benchmarks: - https://compubench.com/compare.jsp?benchmark=compu20d&did1=40772359&os1=Windows&api1=cu&hwtype1=dGPU&hwname1=NVIDIA+GeForce+GTX+1050&did2=4676&os2=Windows&api2=cu&hwtype2=dGPU&hwname2=NVIDIA+GeForce+GTX+660 - https://compubench.com/compare.jsp?benchmark=compu15d&did1=40772359&os1=Windows&api1=cl&hwtype1=dGPU&hwname1=NVIDIA+GeForce+GTX+1050&did2=4676&os2=Windows&api2=cl&hwtype2=dGPU&hwname2=NVIDIA+GeForce+GTX+660 On pure compute power, ofcourse gtx660 is better but thats a very optimized case scenario. Two links up here shows on average gtx1050 destroys gtx660 but on very limited cases gtx660 can still show itself. If you wonder AMD's HD7870(1280 cores), https://compubench.com/compare.jsp?benchmark=compu15d&did1=40772359&os1=Windows&api1=cl&hwtype1=dGPU&hwname1=NVIDIA+GeForce+GTX+1050&did2=22324593&os2=Windows&api2=cl&hwtype2=dGPU&hwname2=AMD+Radeon+HD+7870+GHz+Edition it is like a cache-nerfed gtx1050 because it has 64 cores per multiprocessor unit but not very efficient as gtx1050 in terms of core-to-core bandwidth. Thats why it loses to gtx1050 badly in particle benchmark but wins at T-rex rendering. Similarly a gtx titan(original) is slower than a gtx1060. So, if you just compare against a Pascal GPU, - reduce kepler(desktop) cores by 33% - reduce gcn(7000 series) bandwidth by 50% you should have some ballpark approximation of average optimized application codes. But on fully optimized ones, probably only in AAA+++ games and professional applications, old can still fight. Upvote0DownvoteSolution
Tugrul_512bit
Distinguished
Nov 19, 2013 43 6 18,545 For example, I have 2x Quadro K420(Kepler) and just augmented Nvidia's CUDA toolkit sample "nbody" benchmark with a multitude of CUDA optimization techniques to achieve 65 percent of peak total GFLOPS of two cards: https://www.youtube.com/watch?v=aA6T-HPJeEE I guess a Pascal GPU would reach(or even surpass?) 65 percent out-of-box, without any optimization in CUDA codes. If someone with a GT1030 or a GTX1050 could try the nbody sample with -benchmark -numbodies=65536 parameters, I appreciate the feedback. I don't expect anyone to surpass 72% because nbody algorithm isn't a multiply-add-only algorithm. There are singular adds, singular multiplies and inverse-square-roots too! Those marketed GFLOPS values are only for fused-multiply-add commands. Upvote0Downvote You must log in or register to reply here. Share:FacebookXBlueskyLinkedInRedditTumblrWhatsAppEmailShareLink
TRENDING THREADS
DiscussionWhat's your favourite video game you've been playing?
Started by amdfangirl
Aug 3, 2014
Replies: 4K
PC Gaming
QuestionRX 9070 XT with Corsair CX650F 650W + i5-12400F. Safe to run short term?
Started by 6ila
Today at 1:28 AM
Replies: 4
Graphics Cards
BQuestionGPU not being detected, can’t find a fix that works for me anywhere
Started by Bobo7526
Today at 3:16 AM
Replies: 1
Graphics Cards
2Build AdviceUpgrading desktop for W11
Started by 22547mike
Today at 1:12 AM
Replies: 4
Systems
BQuestionFaulty SSD?
Started by BomBastic.-.
Today at 1:18 AM
Replies: 5
Storage
Seagate's 22TB Expansion external hard drive drops to an all-time low of $249.99 — ideal for massive backups and data archives
Started by Admin
Yesterday at 12:20 PM
Replies: 10
Review Comments
NewsRegistry hack enables new performance-boosting native NVMe support on Windows 11 — Windows Server 2025 feature can be unlocked for consumer PCs, bu...
Started by Admin
Dec 20, 2025
Replies: 75
News Comments
Latest posts
QuestionMy PC is throwing different problems like flickering screen and weird artifacts etc. ?
Latest: ApexDorifto
2 minutes ago
Systems
QuestionGPU not being detected, can’t find a fix that works for me anywhere
Latest: Lutfij
15 minutes ago
Graphics Cards
DiscussionPs5 crashing and not loading textures propaly
Latest: SteJBorchard
21 minutes ago
Consumer Electronics
QuestionFaulty SSD?
Latest: Satan-IR
23 minutes ago
Storage
SBuild AdviceUpgrading desktop for W11
Latest: shaester_123
29 minutes ago
Systems
QuestionRX 9070 XT with Corsair CX650F 650W + i5-12400F. Safe to run short term?
Latest: 6ila
54 minutes ago
Graphics Cards
QuestionArctic Freezer III Pro 360 RGB runs at 3000 RPM constantly ?