Intel Unveils Lunar Lake Architecture: New P and E cores, Xe2-LPG Graphics, New NPU 4 Brings More AI Performance
by Gavin Bonshor on June 3, 2024 11:00 PM ESTIntel Lunar Lake: New E-Core, Skymont Takes Flight For Peak Efficiency
Intel also opts for their Skymont E-cores, which are designed more for efficiency while maintaining a solid level of performance at a lower power envelope.
The Skymont cores feature a significantly broader decode architecture, with a 9-wide decode stage that includes 50% more decode clusters than previous generations. This is supported by a larger micro-op queue, which now holds 96 entries compared to 64 in the previous Crestmont E-Cores.
Intel has improved the out-of-order execution engine by boosting the allocation width from 4 to 8 and the retire width from 8 to 16. The next-gen Skymont core is supposed to surpass Crestmont E-Cores with double the allocation and retire width in terms of its ability to commit and run out-of-order instructions, decreasing overall latency and minimizing stalling for data dependencies.
Queuing and buffering capabilities have also been improved within the Skymont E-Core. It features a deeper reorder buffer of 416 entries, up from the previous 256, while Intel claims the size of the physical register files (PRF) and INT, MEM, and Vectors have been made deeper, too.
Focusing on dispatch ports, Intel has opted for a similar approach to Crestmont. This includes 26 dispatch ports, 8 integer ALUs, 3 Jump Ports, and 3 for load operations per cycle. Regarding Vector performance, Skymont supports 4x128-bit FP and SIMD vectors, which doubles gigaflops/TOPs and reduces latency for floating-point operations.
Intel does provide some figures highlighting Skymont's power efficiency and performance when compared directly to the Low Power Island E-Cores included on the SoC tile on Meteor Lake. In this particular line chart, Intel increases single-threaded performance by 1.7X while consuming just one-third of the power relative to Meteor Lake's LP E-cores.
Looking at multi-threaded performance, Intel puts Skymont 2.9X faster at 1/3rd of the power requirements when compared to Meteor Lake and the LP E-cores. It's worth noting that the Skymont E-Core cluster on the compute tile has double the cores of the Meteor Lake LP E-Core cluster (4 vs. 2), so performance is expected to be higher overall.
Due to their low-powered nature for mobile devices such as notebooks, the Skymont E-cores are designed to be very flexible, with some leverage over previous E-Core architectures. Compared to Raptor Cove, Skymont offers 2% better integer and floating-point performance in single-threaded workloads, with a power and thermal envelope almost identical to Raptor Cove. This is in a more desktop-friendly environment, as Intel does depict the data with the Skymont cores on an LLC or a Ring Bus. This is E-Cores versus the previous gen of P-Cores, in which Intel is claiming a 2% lead.
Intel's Skymont E-cores represent the next leap in Intel's architectural development. According to Intel's disclosure, Skymont looks to be a marked improvement in multiple areas over the previous Crestmont E-Core, including decoding, execution, memory subsystems, and power efficiency. While Intel discloses them as E-Cores, the messaging surrounding Skymont is a little confusing.
The easiest way to decipher this is that they are similar to the two LP E-Cores within the Meteor Lake SoC tile, but with Lunar Lake, they are in a cluster of four built onto the compute tile. On Lunar Lake, they will be as efficient as the LP E-cores of old, but for desktop, they will be in a cluster on the chips Ring Bus, meaning they will likely be similar to the traditional E-cores we've seen before with Intel's 14th/13th/12th Gen Core families.
91 Comments
View All Comments
thestryker - Monday, June 3, 2024 - link
I'm curious what the overall E-core performance is going to look like since the cluster won't have L3 cache access. Chips and Cheese did some analysis of the LP E-cores on MTL and found this specifically to be a big negative. I'm guessing this design is going to be limited to just LNL and is predominantly for the power savings.ET - Tuesday, June 4, 2024 - link
Interestingly, Intel is comparing Skymont to Raptor Cover. I agree that we have to wonder how the L3 (or lack thereof) affect this, but from the Chips and Cheese figures alongside Intel's performance improvement figures, it looks like Skymont without L3 cache will be faster than Crestmont with L3 cache.kwohlt - Tuesday, June 4, 2024 - link
There's 8MB of "SOC cache", separate from both the P and E cores, that should in practice function as the E cores' L3thestryker - Tuesday, June 4, 2024 - link
That's my assumption as well as I think the GPU would be the other part predominantly using it and they shouldn't really both be hitting it at the same time.sharath.naik - Monday, June 10, 2024 - link
Side cache is not the same as L3, or I think they would have called it that. shared L3 is where the memory sync can happen across cores. if not, it needs to go all the way back to ram. So, side caches really cannot be considered as L3, more like expanded L2 for E-core and expanded l3 for P-Core? is my guess. Yes, it means things that run on both E-Core and P-Core, at the same time, will take a hit on performance. I think they were targeting the majority use case. where most won't need more than 4 threads or threads won't be working on the same data.powerarmour - Thursday, June 6, 2024 - link
I can see this being an embarrassing launch if it gets slapped around by Qualcomm's SDx Elitemode_13h - Friday, June 7, 2024 - link
Well, they're on a better node that Qualcomm, so there's that.sharath.naik - Monday, June 10, 2024 - link
It absolutely will. Because this is going to be slower than meteor lake in CPU. Elite is supposed to be 30% faster. Intel should have released 8 P-core version to compete in performance. But I think they wanted to reserve that to be produced on their own fabs.lmcd - Monday, June 17, 2024 - link
Snap Elite is supposed to be 30% faster at essentially-undisclosed power. Lunar Lake will ironically undercut the Snapdragon Elite on power and cost while delivering good performance.Drumsticks - Tuesday, June 4, 2024 - link
I hate to ask this, but was this article fully written by Gavin and proof'ed by another editor? Was there a deadline push to get it out as soon as Intel released the information on Lunar Lake? It just reads so, so disjointed. It feels like there are so many issues in this paragraph alone on the P-core overview; it feels jarring to read."This Lion Cove architecture **also aligns with performance increases**, boasting a predicted double-digit bump in IPC over the older Redwood Cove generation. This uplift is noticed, especially **in the betterment of its hyper-threading, whereby improved IPC** by 30%, dynamic power efficiency improved by 20%, **and previous technologies, in balancing**, without increasing the core area, **in a commitment of Intel to better performance**, within existing physical constraints."
I've seen so much better work from Gavin, and Anandtech in general, that I almost hope that this page was heavily written by software. I know it's a press release, and there's not a whole lot of information, but the level of first party detail here feels similar to the Architecture Day 2021 presentations Intel did on Alder Lake, which got fantastic coverage from Andrei and Dr. Cuttress, and here it feels like we are getting a poorly worded restating of the slides with hardly any analysis or greater than surface level understanding.
I've been reading Anandtech since I was 15, and the level of detail in the Sandy Bridge era articles honestly had a huge influence on my choice to pursue a career in CPU Design. I've mountains of respect for what Anandtech has published in the past, but this article feels rushed.