The NVIDIA GeForce GTX 1660 Ti Review, Feat. EVGA XC GAMING: Turing Sheds RTX for the Mainstream Marketby Ryan Smith & Nate Oh on February 22, 2019 9:00 AM EST
When NVIDIA put their plans for their consumer Turing video cards into motion, the company bet big, and in more ways than one. In the first sense, NVIDIA dedicated whole logical blocks to brand-new graphics and compute features – ray tracing and tensor core compute – and they would need to sell developers and consumers alike on the value of these features, something that is no easy task. In the second sense however, NVIDIA also bet big on GPU die size: these new features would take up a lot of space on the 12nm FinFET process they’d be using.
The end result is that all of the Turing chips we’ve seen thus far, from TU102 to TU106, are monsters in size; even TU106 is 445mm2, never mind the flagship TU102. And while the full economic consequences that go with that decision are NVIDIA’s to bear, for the first year or so of Turing’s life, all of that die space that is driving up NVIDIA’s costs isn’t going to contribute to improving NVIDIA’s performance in traditional games; it’s a value-added feature. Which is all workable for NVIDIA in the high-end market where they are unchallenged and can essentially dictate video card prices, but it’s another matter entirely once you start approaching the mid-range, where the AMD competition is alive and well.
Consequently, in preparing for their cheaper, sub-$300 Turing cards, NVIDIA had to make a decision: do they keep the RT and tensor cores in order to offer these features across the line – at a literal cost to both consumers and NVIDIA – or do they drop these features in order to make a leaner, more competitive chip? As it turns out, NVIDIA has opted for the latter, producing a new Turing GPU that is leaner and meaner than anything that’s come before it, but also very different from its predecessors for this reason.
That GPU is TU116, and it’s part of what will undoubtedly become a new sub-family of Turing GPUs for NVIDIA as the company starts rolling out Turing into the lower half of the video card market. Kicking things off in turn for this new GPU is NVIDIA’s latest video card, the GeForce GTX 1660 Ti. Launching today at $279, it’s destined to replace NVIDIA’s GTX 1060 6GB in the market and is NVIDIA’s new challenger for the mainstream video card market.
|NVIDIA GeForce Specification Comparison|
|GTX 1660 Ti||RTX 2060 Founders Edition||GTX 1060 6GB (GDDR5)||RTX 2070|
|Memory Clock||12Gbps GDDR6||14Gbps GDDR6||8Gbps GDDR5||14Gbps GDDR6|
|Memory Bus Width||192-bit||192-bit||192-bit||256-bit|
|Single Precision Perf.||5.5 TFLOPS||6.5 TFLOPS||4.4 TFLOPs||7.5 TFLOPs
FE: 7.9 TFLOPS
|Manufacturing Process||TSMC 12nm "FFN"||TSMC 12nm "FFN"||TSMC 16nm||TSMC 12nm "FFN"|
|Launch Price||$279||$349||MSRP: $249
We’ll go into the full ramifications of what NVIDIA has (and hasn’t) taken out of TU116 on the next page, but at a high level it’s still every bit a Turing GPU, save the RTX functionality (RT and tensor cores). This means that it has the same core architecture in its SMs, and is directly comparable to the likes of the RTX 2060. Or to flip things around the other direction, versus the older Pascal and Maxwell-based video cards, it comes with all of Turing’s performance and efficiency benefits for traditional graphics workloads.
Compared to RTX 2060 then, the GTX 1660 Ti is actually rather similar. For this fully-enabled TU116 card, NVIDIA has dialed back on the number of SMs a bit, going from 30 to 24, and memory clockspeeds have dropped as well, from 14Gbps to 12Gbps. But past that, the two cards are closer in specifications than we might expect to see for a $70 price tag difference, especially as NVIDIA has kept the 6GB of GDDR6 on a 192-bit memory bus. In an added quirk, the GTX 1660 Ti actually has a slightly higher average boost clockspeed than the RTX 2060, with its 1770Mhz clockspeed giving it a 5% edge here.
The end result is that, on paper, the GTX 1660 Ti actually has a bit more ROP pixel pushing power than its bigger sibling thanks to that 5% boost clock advantage. However the drop in the SM count definitely hits compute and texture performance, where GTX 1660 Ti is going to deliver around 85% of RTX 2060’s compute and shading throughput. Or to frame things in reference to the GTX 1060 6GB it replaces, on the new card offers around 24% more compute/shader throughput (before taking architecture into account), a much smaller 4% increase in ROP throughput, and a very sizable 50% increase in memory bandwidth.
Speaking of memory bandwidth, NVIDIA’s continued use of a 192-bit memory bus in this segment continues to be a somewhat vexing choice since it leads to such odd memory amounts. I’ll fully admit I would have liked to have seen 8GB here, but then that was the case for RTX 2060 as well. The flip side being that at least they aren’t trying to ship a card with just a 128-bit memory bus, as was the case for GTX 960. This puts GTX 1660 Ti in an interesting spot in terms of memory bandwidth, since it’s benefitting from the jump to GDDR6; if you thought the GTX 1060 could use a little more memory bandwidth, GTX 1660 Ti gets it in spades. This has also allowed NVIDIA to opt for cheaper 12Gbps GDDR6 VRAM, marking the first time we’ve seen this in any video card.
Finally, taking a look at power consumption, we see that NVIDIA is going to be holding the line at 120W, which is the same TDP as the GTX 1060 6GB. This is notable because all of the other Turing cards to date have had higher TDPs than the cards they replace, leading to a broad case of generational TDP inflation. Of course we’ll see what actual power consumption is like in our testing, but right off the bat NVIDIA is setting up GTX 1660 Ti to be noticeably more power efficient than the RTX 20 series cards.
Wait, It's a GTX Card?
Along with the new TU11x family of GPUs, for this launch NVIDIA is also creating a new family of video cards: the GeForce GTX 16 series. With GTX 1660 Ti and its obligatory siblings lacking support for NVIDIA’s RTX family of features, the company has decided to clarify their product naming in only a way that NVIDIA can. The end result is that along with keeping the GTX prefix rather than RTX – since these parts obviously lack RTX functionality – the company is also giving them a lower series number. Overall it’s probably for the best that NVIDIA didn’t include these cards with the 20 series, least we get another GeForce 4 situation.
But on the flip side, the number “16” also doesn’t have any great meaning to it; other than not being “20” the number is somewhat arbitrary. According to NVIDIA, they essentially picked it because they wanted a number close to 20 to indicate that the new GPU is very close in functionality and performance to TU10x, and thus “16” instead of “11” or the like. Of course I’m not sure calling it the GTX 1660 Ti is doing anyone any favors when the next card up is the RTX 2060 (sans Ti), but there’s none the less a somewhat clear numerical progression here – and at least for the moment, one not based on memory capacity.
Price, Product Positioning, & The Competition
Moving on, unlike NVIDIA’s other Turing card launches up until now – and unlike the GTX 1060 6GB – the GTX 1660 Ti is not getting a reference card release. Instead this is a pure virtual launch, as NVIDIA calls it, meaning all the cards hitting the shelves are customized vendor cards. Traditionally these launches tend to be closer to semi-custom cards – partners tend to use NVIDIA’s internal reference board design or their first cards – so we’ll have to see what pops up over the coming weeks and months. For now then, this means we’re going to see a lot of single and dual-fan cards, similar to the kinds of designs used for a lot of the GTX 1060 cards and some of the RTX 2070 cards.
Another constant across the Turing family has been price inflation, and the GTX 1660 Ti is no exception. With a launch price of $279, the new card is launching at $30 above the GTX 1060 6GB it replaces. This is a lot better than the $349 that NVIDIA wants for the RTX 2060, but in case anyone thought that the $250 price tag of the GTX 1060 was a fluke, then it’s clear that sub-$300 is the new norm for xx60 cards, and not sub-$200 as the GTX 960 flirted with. It’s also worth noting that NVIDIA won’t be launching with any bundles here; neither the RTX Game On bundle nor the GTX 1060 Fortnite bundles will be in play here, so what you see is what you get.
In terms of positioning against their own cards, NVIDIA is rolling out the GTX 1660 Ti as the successor to the GTX 1060 6GB, the latter of which are becoming increasingly rare in the market as NVIDIA’s unplanned Pascal stockpile is finally drawn down. So the GTX 1660 Ti and GTX 1060 won’t be sharing space on store shelves for long. However like the other Turing cards, the GTX 1660 Ti is not a true generational successor to the GTX 1060; at roughly 36% faster, NVIDIA is not expecting anyone to upgrade from their mid-range Pascal card to this. Instead, NVIDIA’s marketing efforts are going to be heavily focused on enticing GTX 960 users, who are a further generation back, to finally upgrade. In that respect the GTX 1660 Ti has a very large performance advantage, but this may be a tough sell since the GTX 960 launched at a much cheaper $199 price point.
As for AMD, the launch of the GTX 1660 Ti finally puts a Turing card in competition with their Polaris cards, particularly the $279 Radeon RX 590, a fight that the Radeon cannot win. While AMD hasn’t announced any price changes for the RX 590 at this time, AMD will have little choice but to bring it down in price.
Instead, AMD’s competitor for the GTX 1660 Ti looks like it will be the Radeon RX Vega 56. The company sent word last night that they are continuing to work with partners to offer lower promotional prices on the card, including a single model that was available for $279, but as of press time has since sold out. Notably, AMD is asserting that this is not a price drop, so there’s an unusual bit of fence sitting here; the company may be waiting to see what actual, retail GTX 1660 Ti card prices end up like. So I’m not wholly convinced we’re going to see too many $279 Vega 56 cards, but we’ll see. If nothing else, AMD’s Raise the Game Bundle is being offered, giving them an edge over NVIDIA in terms of pack-in games.
|Q1 2019 GPU Pricing Comparison|
|Radeon RX Vega 64||$499||GeForce RTX 2070|
|$349||GeForce RTX 2060|
|$329||GeForce GTX 1070|
|Radeon RX Vega 56*
Radeon RX 590
|$279||GeForce GTX 1660 Ti|
|$249||GeForce GTX 1060 6GB
|Radeon RX 580 (8GB)||$179/$189||GeForce GTX 1060 3GB
Post Your CommentPlease log in or sign up to comment.
View All Comments
Retycint - Tuesday, February 26, 2019 - linkAMD selling overpriced cards does not subtract from the point that Nvidia is also attempting to raise the price as well. Both companies have put out underwhelming products this gen
Rocket321 - Friday, February 22, 2019 - link"finally puts a Turing card in competition with their Pascal cards" should say Polaris.
Ryan Smith - Friday, February 22, 2019 - linkBoy I can't wait for Navi, since it sounds nothing like Turing...
Kogan - Friday, February 22, 2019 - linkAww, I was hoping this release would lower the price on those used 1070's. Oh well. I'll still probably go for a used 1070 over this one. Nearly identical in every way and can be found for as low as $200.
Hamm Burger - Friday, February 22, 2019 - linkReading "Turing Sheds" in the headline makes me wonder what he could have done with a couple of these at Bletchley Park (which, for anybody passing, is well worth the steep entry fee — see bletchleypark.org.uk).
Sorry for the interruption. I'll return you to the normal service.
Colin1497 - Friday, February 22, 2019 - link"Now the bigger question in my mind: why is it so important to NVIDIA to be able to dual-issue FP32 and FP16 operations, such that they’re willing to dedicate die space to fixed FP16 cores? Are they expecting these operations to be frequently used together within a thread? Or is it just a matter of execution ports and routing?"
It seems pretty likely that they added the FP16 cores because it simplified design, drivers, etc. It was easier to just drop in a few (as you mentioned) tiny FP16 cores than it was to change behavior of the architecture.
CiccioB - Friday, February 22, 2019 - linkFP16 is a way to simplify shading computing over the common used FP32.
They allow for higher bandwidth (x2) and higher speed (x2, so half the energy for the same work) with the same HW space occupation. It was a feature used in HPC where bandwidth, power consumption and of course computation time are quite critical. They then ended in game class architecture just because they have find a way to exploit it there too.
Some games have started using FP16 for their shading. On AMD fence, only Vega class cards support packed FP16 math.
The use of a INT ALU that executes integer instructions together with the FP ones is instead an exclusive feature that can really improve shading performance much more than any other complex feature like high threaded (constantly interrupted) mechanism that is needed on architectures that cannot keep the ALUs feed.
In fact we see that with less CUDA cores Turing can do the same work of Pascal even using less energy. And no magic ACE is present.
Yojimbo - Friday, February 22, 2019 - linkThey didn't just drop in a few. It seems they have enough for 2x FP32 performance. Why are they dual issue? My guess is it is because that is what's necessary for Tensor Core operation. I think NVIDIA is being a bit secretive about the Tensor Cores. It's clear they took the RT Core circuitry out of the Turing minor die. As far as the Tensor Cores, I'm not so sure. Think about it this way: suppose Tensor Cores really are specialized separate cores. Then they also happen to have the capability of non tensor FP16 operation in dual issue with FP32 CUDA cores? Because if they don't then whatever functionality NVIDIA has planned for the FP16 cores on Turing minor would be incompatible with Turing major and Volta. I don't see how that can be the case, however, because, according to this review, Turing major is listed as the same CUDA compute generation as Turing minor. Now if the Tensor Cores can double as general purpose FP16 CUDA cores, then what's to say that FP16 and FP32 CUDA cores can't double as Tensor Cores? That is, if the Tensor Core can be made with two data flow paths, one following general purpose FP16 operations and one following Tensor Core instruction operations, then commutatively a general purpose CUDA core can be made with two data flow paths, one following general purpose operations and one following Tensor Core instruction operations.
When Turing came out with Tensor Core operations but with FP64 cores cut from the die and no increase in FP32 CUDA cores per SM over Volta I was surprised. But with this new information from the Turing Minor launch it makes more sense to me. I don't know if they have the dedicated FP16 cores on Volta. If they do then the FP64 cores don't need to play the following role, but if they are able to use the FP64 cores as FP16 cores then hypothetically they have enough cores to account for the 64 FMA operations per clock per SM of the 8 Tensor Cores per SM. But on Turing major they just didn't have the cores to account for the Tensor Core performance. These FP16 cores on Turing minor seem to be exactly what would be necessary to make up for the shortfall. So, my guess is that Turing major also has these same cores. The difference is either entirely one of firmware/drivers that allows the Tensor Core data path to be operated on Turing major but not Turing minor or Turing major has some extra circuitry that allows the CUDA cores to be lashed together with an alternate data flow path that doesn't exist in Turing minor.
GreenReaper - Friday, February 22, 2019 - linkAgreed. It seems likely that most of the hardware is present, just not active.
Frankly, it's not clear why these couldn't be binned versions of the higher-level chips that haven't met the QA requirements, which would be one reason it took this long to release - you need enough stock to be able to distribute it. If it's planned out in advance, they just need X good CUDA cores and Y ROPs that run at Z Mhz, combined with at least [n] MB of cache. Fuse off the bad or unwanted portions to save on power and you're good.
Of course it *could* be like Intel, which truly make smaller derivatives. If so that suggests they'll be selling a lot of these cards. Even then, though, Yojimbo's supposition about the core design being essentially the same is likely to be true.
Yojimbo - Saturday, February 23, 2019 - linkYeah the die size and transistor count is still large for the number of CUDA cores, being that this review claims the 1660Ti has all SMs on the TU116 enabled. I said it was clear they took RT circuitry out. But I was wrong, that's not clear. It seems the die area per CUDA core and transistors per CUDA core of the TU116 are extremely close to the TU106, which is fully-enabled in RTX 2070. If this is the result of the INT32 and FP16 cores of the TU116 then where exactly do any cost savings of removing the Tensor Cores and RT Cores come from? Definitely the cost of completely re-architecting another GPU would outweigh the slight reduction in die size they seem to have achieved.
On the other hand, I'd imagine TU116 will be such a high volume part that unless yields are really lousy, binning alone won't provide enough chips (and where are the fully enabled versions of the 284 mm^2 RTX dies going, anywhere? No such product has thus far been announced.) Perhaps such a small number of RT cores was judged to be insufficient for RTX gaming. Even if not impossible to create some useful effects including that many RT cores, if developers were incentivized to target such few RT cores with their RTX efforts because the volume of such RT-enabled cards was significant then they may reduce the scope and scale of RTX enhancements they undertake, putting a drag on the adoption of the technology. So NVIDIA opted to disable the RT cores, and perhaps the Tensor Cores, present on the dies even when they are actually fully functioning. Perhaps it was simply cheaper to eat the wasted die space per chip than to design an entirely new GPU with the RT cores and Tensor Cores removed.