We’re following the state of play with Intel’s new CEO, Pat Gelsinger, very closely. Even as an Intel employee for 30 years, rising to the rank of CTO, then taking 12 years away from the company, his arrival has been met with praise across the spectrum given his background and previous successes. He isn’t even set to take his new role until February 15th, however his return is already causing a stir with Intel’s current R&D teams.

News in the last 24 hours, based on public statements, states that former Intel Senior Fellow Glenn Hinton, who lists being the lead architect of Intel’s Nehalem CPU core in his list of achievements, is coming out of retirement to re-join the company. (The other lead architect of Nehalem are Ronak Singhal and Per Hammerlund - Ronak is still at Intel, working on next-gen processors, while Per has been at Apple for five years.)

Hinton is an old Intel hand, with 35 years of experience, leading microarchitecture development of Pentium 4, one of three senior architects of Intel’s P6 processor design (which led to Pentium Pro, P2, P3), and ultimately one of the drivers to Intel’s Core architecture which is still at the forefront of Intel’s portfolio today. He also a lead microarchitect for Intel’s i960 CA, the world’s first super-scalar microprocessor. Hinton holds more than 90+ patents from 8 CPU designs from his endeavors. Hinton spent another 10+ years at Intel after Nehalem, but Nehalem is listed in many places as his primary public achievement at Intel.

On his social media posts, Hinton states that he will be working on ‘an exciting high performance CPU project’. In the associated comments also states that ‘if it wasn’t a fun project I wouldn’t have come back – as you know, retirement is pretty darn nice’. Glenn also discloses that he has been pondering the move since November, and Gelsinger’s re-hiring helped finalize that decision. His peers also opine that Glenn is probably not the only ex-Intel architect that might be heading back to the company. We know a few architects and specialists that have left Intel in recent years to join Intel's competitors, such as AMD and Apple.

There are a few key things to note here worth considering.

First is that coming out of retirement for a big CPU project isn’t a trivial thing, especially for an Intel Senior Fellow. Given Intel’s successes, one would assume that the financial situation is not the main driver here, but the opportunity to work on something new and exciting. Plus, these sorts of projects take years of development, at least three, and thus Glenn is signing on for a long term despite already having left to retire.

Second point is reiterating that last line – whatever project Glenn is working on, it will be a long term project. Assuming that Glenn is talking about a fresh project within Intel’s R&D ecosystem, it will be 3-5 years before we see the fruits of the labor, which also means creating a design aimed at what could be a variety of process node technologies. Glenn’s expertise as lead architect is quite likely applicable for any stage of an Intel R&D design window, but is perhaps best served from the initial stages. The way Glenn seems to put it, this might be a black-ops style design. It also doesn't specify if this is x86, leaving that door open to speculation.

Third here is to recognize that Intel has a number of processor design teams in-house and despite the manufacturing process delays, they haven’t been idle. We’ve been seeing refresh after refresh of Skylake lead Intel's portfolio, and while the first iterations of the 10nm Cove cores come to market, Intel’s internal design teams would have been working on the next generation, and the next generation after that – the only barrier to deployment would have been manufacturing. I recall a discussion with Intel’s engineers around Kaby Lake time, when I asked about Intel’s progress on IPC – I requested a +10% gen-on-gen increase over the next two years at the time, and I was told that those designs were done and baked – they were already working on the ones beyond that. Those designs were likely Ice/Tiger Lake, and so Intel’s core design teams have been surging ahead despite manufacturing issues, and I wonder if there’s now a 3-4 year (or more) delay on some of these designs. If Glenn is hinting at a project beyond that, then we could be waiting even longer.

Fourth and finally, one of the critical elements listed by a number of analysts on the announcement of Gelsinger’s arrival was that he wouldn’t have much of an effect until 3+ years down the line, because of how product cycles work. I rejected that premise outright, stating that Pat can come in and change elements of Intel’s culture immediately, and could sit in the room with the relevant engineers and discuss product design on a level that Bob Swan cannot. Pat has the opportunity to arrange the leadership structure and instill new confidence in those structures, some of which may have caused key architects in the past to retire, instead of build on exciting projects.

As we can see, Pat is already having an effect before his name is even on the door at HQ.

Today is also Intel’s end-of-year financial disclosure, at 5pm ET. We are expecting Intel’s current CEO, Bob Swan, to talk through what looks to be another record breaking year of revenue, and likely the state of play for Intel's own 7nm process node technologies. That last point is somewhat thrown into doubt given the new CEO announcement and if Gelsinger is on the call. It is unknown if Gelsinger will participate.

Related Reading

 

Comments Locked

112 Comments

View All Comments

  • mode_13h - Monday, January 25, 2021 - link

    > doubt whether it was a massive role. Looking at the K8, it had only two more stages than K7's 10. And Core, at 14, was roughly the same as the Pentium M.

    If you're not comparing on the same process node, then it doesn't mean much.
  • mode_13h - Saturday, January 23, 2021 - link

    The P4's HT was sunk by lack of a good scheme for dealing with cache contention, IIRC. When Nehalem brought it back, I think that was the main addition.
  • GeoffreyA - Sunday, January 24, 2021 - link

    Along with too few execution resources, if I remember right, and compared with the Athlon.
  • mode_13h - Sunday, January 24, 2021 - link

    When I first got my P4, I wrote a little benchmark that ran 2 threads each doing some fundamentally serial computation. The result was almost a perfect 2x speedup. Another case it handled well was one thread doing memcpy() while another did complex computation - I think I got like 85% speedup, there.

    No doubt that Nehalem being much wider also made the payoff bigger, for HT. These days, I enjoy it quite a lot when compiling code. However, I find it helps very little, in AVX2-heavy workloads.
  • GeoffreyA - Sunday, January 24, 2021 - link

    That's quite a speedup. To this day, I have never tested HT on a Pentium 4. The only ones I had/have access to were a 1.7 GHz Willamette and 2.4 GHz Northwood.

    As for AVX2, I reckon it's because those instructions are already filling the FP units to the edge, so there's not much left for a second thread.
  • mode_13h - Monday, January 25, 2021 - link

    > As for AVX2, I reckon it's because those instructions are already filling the FP units to the edge, so there's not much left for a second thread.

    Yes, the code is highly-optimized with AVX2 in the hot loops. Since the CPU's AVX pipelines can filled with a single, well-tuned thread, the result wasn't surprising (but noteworthy, none the less).

    Another use of SMT to help hide long memory access latencies, but if your access patterns are sufficiently regular, then the prefetcher in modern CPUs can do that even better.
  • GeoffreyA - Tuesday, January 26, 2021 - link

    I've got a feeling that even with AVX512 support in Ice and Rocket Lake, AVX/2 code won't show much of a boost, likely because support is being implemented using two 256-bit halves.

    By the way, here's an excellent chart I found some years ago. Some useful data on the Pentium 4 as well.

    https://images.anandtech.com/reviews/cpu/intel/cor...
  • mode_13h - Thursday, January 28, 2021 - link

    Thanks for the chart. That's interesting. I gather the P6 is Pentium Pro? Than mostly makes sense, but some things are much bigger than I thought, like the number of issue ports... I could swear it was only 2-way superscalar.
  • GeoffreyA - Sunday, January 31, 2021 - link

    "I gather the P6 is Pentium Pro?"

    It should be. But I'll take a look into this and return with an answer. Not sure how much changed in the Pentium II and III, beyond MMX, SSE, and cache. And yes, I expected it to have 3 issue ports myself.
  • GeoffreyA - Sunday, January 31, 2021 - link

    According to Agner Fog, there are 5 ports, with various execution units, but owing to register renaming being limited to 3 uops earlier on, in most cases you'll be limited to 3 uops (pp. 76, 80).

    https://www.agner.org/optimize/microarchitecture.p...

Log in

Don't have an account? Sign up now