The (now) not so “secret sauce” behind ARM Cortex-A72

From CADENCE DESIGN SYSTEMS, INC. First Quarter 2015 Financial Results Conference Call, April 27, 2015

In early March, we launched Innovus, which is our new internally developed product for digital implementation.

  • Innovus delivers 10 to 20 percent improvement in performance power, and area; and uses massively parallel computing, to provide up to 10X reduction in turn-around-time.
  • Innovus joins Tempus, Voltus and Quantus QRC to provide our customers with a next-generation suite of digital implementation and signoff tools for advanced design. We expect these products to drive strong growth in digital, which is the largest segment in EDA.
  • Several customers have already shared their positive experiences using Innovus, including ARM, Freescale, Juniper, MaxLinear, Renesas and Spreadtrum. In fact, ARM used Innovus during the development of the new Cortex-A72 processor, achieving 2.6 gigahertz performance.

From CDN Live: Cadence User Conference 2015, Bengaluru, India—August 18-19

ICD109
“3GHz And Beyond” – Jumping to the New Performance Spectrum with Cortex-A72 ARM
Devyani Patra – ARM

The need for speed keeps increasing as we hop over technology nodes and architectural upgrades on ARM MPCore processors. High-end mobile and server applications raise the performance bar higher year on year. We look at a multitude of these implementations at TSMC 16nm FINFET Node aided by the plethora of options provided by EDI. These have been optimized to define a new performance spectrum which starts at 3GHz and goes beyond it. In this paper, I will present several implementation aspects that hold the key for achieving 3GHz through POP IP.

July 29, 2015: Tackling 16nm Challenges for ARM Cortex-A72 Processor on Cadence Design Systems YouTube channel

ARM’s Brent McKanna, a senior principal design engineer, leads the company’s CPU implementation video. In this 2-minute video, McKanna talks about how ARM and Cadence worked together to address the challenges of developing the ARM Cortex-A72 processor at 16nm. From changing process rules and IP to tool readiness and routability concerns, the teams overcame the challenges, using Cadence® Encounter® and Innovus™ technologies to develop a higher performing, lower power processor. Now, the companies are ready to tackle the challenges of 10nm. After watching the video, you can learn more about the Innovus Implementation System here: http://bit.ly/1DbfkRb

Feb 4, 2015: Cadence announces complete SoC development environment for new ARM mobile IP suite by Brian Fuller, Editor in Chief at Cadence (since June 2015 at ARM)

When we look back on today, we might say it was the dawn of server-in-your-pocket technology.

That’s because ARM announced a new premium mobile IP suite (New Announcements: ARM Sets New Standard for the Premium Mobile Experience) that includes the ARM Cortex-A72 64-bit processor core, ARM Mali-T860 and T880 GPUs and a new, faster CoreLink interconnect.

In tandem, Cadence announced a reference flow for the suite that supports advanced manufacturing processes including TSMC 16-nanometer FinFET Plus. Also available with the Cadence flow is ARM Artisan physical IP and ARM POP IP for the ARM Cortex-A72 processor and ARM Mali-T860 and T880 GPUs, enabling designers to meet aggressive processor performance and power goals.

Cortex-A72-chip-diagram-LG.png

Dr. Chi-Ping Hsu, Cadence senior vice president and chief strategy officer for EDA, said the two companies worked together to ensure that the Cadence flow allows customers to integrate the ARM Mali-T880 GPU and ARM CoreLink CCI-500 to achieve optimal results at advanced process nodes.

ARM used the Cadence digital and system-to-silicon verification tools and IP during the ARM Cortex-A72 processor development to ensure that the flow met complex mobile design requirements, Hsu added.

So why a server in your pocket? The processing power that comes with this innovation is the equivalent of a server or more to your pocket in mobile devices.

What’s more it won’t burn a hole in your pocket. The Cortex-A72 is 3.5x faster than 2014’s 32-bit Cortex-A15 and consumes 75 percent less power, according to ARM.

The Register reported that the A72 has twice the performance and half the power of ARM’s 64-bit flagship the A57. That processor has been targeted at servers, among other applications. So server in your pocket? Indeed.

Related stories:
– Cadence System Design and Verification
– Cadence Functional Verification
– Optimizing ARM Cortex-M7 with Cadence

March 10, 2015: Cadence Introduces Innovus Implementation System, Delivering Best-in-Class Results with Up to 10X Reduction in Turnaround Time company press release

  • Provides typical 10 to 20 percent production-proven advantage in power, performance and area
  • First massively parallel implementation solution in the industry, enabling unprecedented speed and capacity
  • Supports advanced 16/14/10nm FinFET and established process nodes
  • Next-generation platform eases usability and boosts engineering productivity

Cadence Design Systems, Inc. (NASDAQ: CDNS) today unveiled Cadence® Innovus™ Implementation System, its next-generation physical implementation solution that enables system-on-chip (SoC) developers to deliver designs with best-in-class power, performance and area (PPA) while accelerating time to market. Driven by a massively parallel architecture with breakthrough optimization technologies, the Innovus Implementation System provides typically 10 to 20 percent better PPA and up to 10X full-flow speedup and capacity gain at advanced 16/14/10nm FinFET processes and established process nodes.

For more information on the Innovus Implementation System, please visit www.cadence.com/news/innovus.

The Innovus Implementation System was designed with several key capabilities to help physical design engineers achieve best-in-class performance while designing for a set power/area budget or realize maximum power/area savings while optimizing for a set target frequency. The key Innovus capabilities to achieve this include:

  • New GigaPlace solver-based placement technology that is slack-driven and topology-/pin access-/color-aware, enabling optimal pipeline placement, wirelength, utilization and PPA, and providing the best starting point for optimization
  • Advanced timing- and power-driven optimization that is multi-threaded and layer aware, reducing dynamic and leakage power with optimal performance
  • Unique concurrent clock and datapath optimization that includes automated hybrid H-tree generation, enhancing cross-corner variability and driving maximum performance with reduced power
  • Next-generation slack-driven routing with track-aware timing optimization that tackles signal integrity early on and improves post-route correlation
  • Full-flow multi-objective technology enables concurrent electrical and physical optimization to avoid local optima, resulting in the most globally optimal PPA

The Innovus Implementation System also offers multiple capabilities that boost turnaround time for each place-and-route iteration. Its core algorithms have been enhanced with multi-threading throughout the full flow, providing significant speedup on industry-standard hardware with 8 to 16 CPUs. Additionally, the Innovus Implementation System features the industry’s first massively distributed parallel solution that enables the implementation of design blocks with 10 million instances or larger. Multi-scenario acceleration throughout the flow improves turnaround time even with an increasing number of multi-mode, multi-corner scenarios.

In addition to providing best-in-class PPA and optimized turnaround time, the Innovus Implementation System offers a common user interface (UI) across synthesis, implementation and signoff tools, and data-model and API integration with the Tempus™ Timing Signoff solution and Quantus™ QRC Extraction solution. Together these solutions enable fast, accurate, 10nm-ready signoff closure that facilitates ease of adoption and an end-to-end customizable flow. Customers can also benefit from robust visualization and reporting that enables enhanced debugging, root-cause analysis and metrics-driven design flow management.

At ARM, we push the limits of silicon and EDA tool technology to deliver products on tight schedules required for consumer markets, said Noel Hurley, general manager, CPU group, ARM. “We partnered closely with Cadence to utilize the Innovus Implementation System during the development of our ARM® Cortex®-A72 processor. This demonstrated a 5X runtime improvement over previous projects and will deliver more than 2.6GHz performance within our area target. Based on our results, we are confident that the new physical implementation solution can help our mutual customers deliver complex, advanced-node SoCs on time.”

“Customers have already started to employ the Innovus Implementation System to help achieve higher performance, lower power and minimized area to deliver designs to the market before the competition can,” said Dr. Anirudh Devgan, senior vice president of the Digital and Signoff Group at Cadence. “The early customers who have deployed the solution on production designs are reporting significantly better PPA and a substantial turnaround time reduction versus competing solutions.”

About Cadence
Cadence enables global electronic design innovation and plays an essential role in the creation of today’s integrated circuits and electronics. Customers use Cadence software, hardware, IP and services to design and verify advanced semiconductors, consumer electronics, networking and telecommunications equipment, and computer systems. The company is headquartered in San Jose, Calif., with sales offices, design centers and research facilities around the world to serve the global electronics industry. More information about the company, its products and its services is available at www.cadence.com.

March 17, 2015: Cadence’s New Digital Implementation System: An Inside Look  by Brian Fuller, Editor in Chief at Cadence (since June 2015 at ARM)

SANTA CLARA, Calif.—The launch of Cadence’s new Innovus Implementation System heralds “a new era” in physical implementation technology, breaking longstanding electronic system-design bottlenecks, according to Rahul Deokar, product management director with Cadence.

Deokar gave a technical overview of the new technology at CDNLive Silicon Valley, just hours after it was unveiled during a keynote address at the annual event (March 10). “Older implementation tools had forced you guys as designers to do smaller design blocks,” he told a standing room-only audience at the Santa Clara Convention Center here. “You can now handle 5-10 million instance design blocks…and you can take weeks or even months off your SoC design schedules.”

Slide06.jpg

Leapfrog Effect

Against the backdrop of SoC Implementation, the new technology represents a fundamental overhaul of the Encounter system that “leapfrogs” the industry and delivers a far more compelling digital implementation solution than the industry has experienced, Deokar said.

Previously, optimizing for power, performance and area (PPA) and improving turn-around time (TAT) was an either-or choice, he said.

“These were two conflicting objectives in a lot of ways. Traditional tools have effectively tackled just one or the other, however what good is it if the tool runs super-fast but ends up with sub-optimal PPA,” Deokar said. “Innovus gets you the best of both worlds on TAT and PPA.”

By delivering performance that is up to10x faster, design blocks that took 7-10 days can now be run in 1-2 days. The 10-20 percent PPA improvement is equivalent to a half-node or even a full-node transition, without actually moving to the new node, he added.

Furthermore, because the technology is integrated with Cadence signoff solutions, significant additional productivity gains can be achieved along the flow, he added.

And Innovus is not targeted at just bleeding-edge nodes such as 16/14/10nm; it has vast utility for established process nodes as well, Deokar said.

Driving improved TAT

A massively parallel architecture is key to improved turn-around time, Deokar said. The core algorithms have been improved such that “even if you’re not running on 16 or 32 or 64 CPUs, the core algorithms of placement, optimization and routing have been sped up. Even on 2- and 4-CPU machines, you should be able to see TAT advantages,” he said. “Now, add multithreading, distributed network processing and MMMC (multi-mode/multi-corner) scenario acceleration, and you get the complete massively parallel system.”

That means really large chips that forced teams to divide the SoC into many blocks to manage the placement and routing complexities can now work with fewer blocks, which cuts design time and saves money, Deokar said.

He cited as one example a 28nm 2.8 million-cell networking IP running on 8 CPUs (pictured) that saw implementation time cut from 336 hours to 48 hours—a 7x improvement.

Slide12.jpg

Pushing PPA

The other key Innovus benefit for PPA represents a big step forward, he said.

Traditionally, placement ran on heuristic-based algorithms, but GigaPlace in Innovus is solver- or equation-based.

“That means you can model in the equation a lot of different design variables – timing, slack, power, wire length, congestion, layer awareness,” Deokar said. “GigaPlace concurrently solves both the electrical and physical objectives. As a result you get better PPA.”

Another feature is that Innovus is now power aware throughout the optimization process, Deokar said.

“All the transforms that were timing and area aware, now power is a part of that same cost function,” he told his audience.

A third key component is that the concurrent clock and datapath optimization technology from Azuro, which Cadence acquired in 2011, is now fully integrated.

“A lot high-performance designers have unique clocking methodologies — H-trees, clock meshes, multipoint CTS,” he said. “You guys invest a lot of manual effort building these, but since these are customized, they’re not flexible when process and technology changes occur.”

The CCOopt FlexH feature integrated into Innovus is a combination of regular clock tree and H-tree, he said, “You get the best of both worlds in automation and in cross-corner variation, as well as in high-performance and a power-efficient clock network.”

Deokar also highlighted the track-aware optimization features of NanoRoute.

“Before you go into your detailed route step, right after track assignment to the different metals layers, we do timing-aware optimization,” Deokar said. “This pro-actively prevents signal integrity issues from occurring downstream in the flow, and dramatically reduces the timing jump between pre route and post route optimization.”

Finally Deokar noted productivity gains from the integration of Innovus with existing Cadence signoff technologies such as Tempus, Voltus and Quantus, a common user interface and reporting and visualization enhancements.

More information about Innovus can be found by navigating to the technology’s landing page.

About Nacsa Sándor

Lazure Kft. • infokommunikációs felhő szakértés • high-tech marketing • elérhetőség: snacsa@live.com Okleveles villamos és automatizálási mérnök (1971) Munkahelyek: Microsoft, EMC, Compaq és Digital veterán. Korábban magyar cégek (GDS Szoftver, Computrend, SzáMOK, OLAJTERV). Jelenleg Lazure Kft. Amire szakmailag büszke vagyok (időrendben visszafelé): – Microsoft .NET 1.0 … .NET 3.5 és Visual Studio Team System bevezetések Magyarországon (2000 — 2008) – Digital Alpha technológia vezető adatközponti és vállalati szerver platformmá tétele (másokkal együttes csapat tagjaként) Magyarországon (1993 — 1998) – Koncepcionális modellezés (ma használatos elnevezéssel: domain-driven design) az objektum-orientált programozással kombinált módon (1985 — 1993) – Poszt-graduális képzés a miniszámítógépes szoftverfejlesztés, konkurrens (párhuzamos) programozás és más témákban (1973 — 1984) Az utóbbi időben általam művelt területek: ld. lazure2.wordpress.com (Experiencing the Cloud) – Predictive strategies based on the cyclical nature of the ICT development (also based on my previous findings during the period of 1978 — 1990) – User Experience Design for the Cloud – Marketing Communications based on the Cloud
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s