Home » smartphones » The state of big.LITTLE processing

The state of big.LITTLE processing

Prerequisites (June 2015⇒):

Welcome to technologies trend tracking for 2015⇒2019 !!! v0.7
5G: 2015⇒2019 5G Technologies for the New Era of Wireless Internet of the 2020’s and 2030’s
Networked Society—WTF ??? v0.5
Microsoft Cloud state-of-the-art v0.7
• Service/telco for Networked Society
• Cloud for Networked Society
• Chrome for Networked Society
• Windows for Networked Society

Opportunity for Microsoft and its Partners in FY17:

As progressed since FY15:

Or enter your email address to subscribe to this blog and receive notifications of new posts by email:

Join 93 other followers

2010 – the 1st grand year of:

3.5G...3.9G level mobile Internet
• system-on-a-chip (SoC) and
reflective display technologies

Why viewed most (till Feb 1):

Marvell SoC leadership
Android 2.3 & 3.0
Hanvon's strategy
Welcome! or Home pages
Treesaver (LATELY #2!) and
IMT-Advanced (4G)
MORE ON THE STATISTICS PAGE

Core information:

Complementary post reminder: Eight-core MT6592 for superphones and big.LITTLE MT8135 for tablets implemented in 28nm HKMG are coming from MediaTek to further disrupt the operations of Qualcomm and Samsung [‘Experiencing the Cloud’, July 20, 2013] from which the following excerpts I will include here as the ones directly related to the content given here as well:
There are also two software models now available, that ARM and Linaro have developed to enable control of workloads, performance, and power management on big.LITTLE SoCs. … The second is the Global Task Scheduling (GTS) [also known as big.LITTLE MP] software developed (and now named) by ARM.
Until GTS functionality is fully upstream, ARM is supporting the big.LITTLE MP patch set for its licensees, leveraging Linaro’s public monthly and Linaro LSK builds, so that it is available to all ARM licensees for product integration and deployment. Linaro also expect to provide a topic branch for the latest work available on the upstream GTS implementation for interested developers.
ARM and Linaro now recommend product development and deployment to be based on the GTS solution. However, there are some cases where hardware limitations or a requirement for the traditional Linux scheduler (for example in some embedded applications) may lead to IKS still being required.
Real Life Results
ARM has published further information on big.LITTLE configurations and performance in a blog entry here [Ten Things to Know About big.LITTLE [Brian Jeff on SoC Design blog of ARM, June 18, 2013]].
The first commercial products based on big.LITTLE are certain international versions of the latest Galaxy S4 phone from Linaro member, Samsung. Samsung-LSI provide an ‘Octa-core’ 4+4 big.LITTLE chip for this phone. As has been publicly noted, the current generation of hardware cannot yet take full advantage of the IKS or the GTS designs because the hardware power-saving core switching feature is implemented on a cluster basis rather than on a per-core or a per-pair basis. …
End of the complementary post reminder

The first big.LITTLE device (Samsung Galaxy S4, Exynos 5 Octa version) was announced mid-March and hopefully will be available from end of April at the earliest, and in a few countries only (US is one of them). The price is also way too high: $1,379 unlocked on Amazon. 70% of the first 10M S4 smartphones will come with the quad-core Snapdragon S600 instead (seemingly for as low price as $800). The reason is: Samsung Semiconductor is just entering 28nm production with this SoC so it is “scheduled for mass-production in the second quarter of 2013”. While we should therefore wait probably till Q3 for larger scale availability it is already time to examine both the product and the form of big.LITTLE processing delivered with it:

Introducing Samsung GALAXY S 4 [Samsung Mobile Press, March 14, 2013]

Developed to redefine the way we live, the GALAXY S 4 makes every moment of our life meaningful. It understands the value of relationships, enables true connections with friends and family, and believes in the importance of effortless experience.
Highly crafted design with a larger screen and battery, thin bezel, housed in a light 130g and slim 7.9mm chassis. The new Samsung GALAXY S 4 is slimmer, yet stronger.
The GALAXY S 4 gets you closer to what matters in life, and brings your world together.
For a richer, simpler and fuller life.
To find out more, click here http://www.samsung.com/galaxys4/

Samsung Introduces the GALAXY S 4 – A Life Companion for a richer, simpler and fuller life [Samsung press release, March 14, 2013] in US: Pre Order with Octa-Core … Will Ship on Date 30 April By Fedex

… Samsung GALAXY S 4 will be available from Q2 globally [in UK: from April 26th but the Qualcomm Quad-Core; in US: Pre Order with Octa-Core … Will Ship on Date 30 April By Fedex] including US, partnering with AT&T, Sprint, T-Mobile, Verizon Wireless, as well as US Cellular and Cricket. In Europe, Samsung GALAXY S 4 is partnering with global mobile operators such as Deutsche Telecom, EE, H3G, Orange, Telenor, Telia Sonera, Telefonica, and Vodafone. …

AP

  • 1.9 GHz [Qualcomm] Quad-Core Processor / 1.6 GHz [Samsung] Octa-Core Processor
  • The selection of AP will be differed by markets.

70% of first Galaxy S4s to come with Snapdragon 600 CPU. Samsung LSI couldn’t make enough Exynos 5 Octas in time [Unwired.com, March 25, 2013]

70% of the first 10 million Samsung Galaxy S4 production batch will come with Qualcomm Snapdragon 600 CPU, instead of its own Exynos 5 Octa, Korean ETNEws reports.
Samsung’s LSI division, responsible for the next generation Exynos CPU, failed to iron out the production and performance issues to have enough chips in time for Galaxy S4 launch. Couple of weeks ago Samsung announced that Exynos 5 Octa applications processor is scheduled for mass production only in Q2 2013. Which is too late for the huge volumes of Galaxy S4 shipments that will start in late April.
Last year Samsung already faced production problems with Galaxy S3 and lost a lot of sales in early summer because of it. This year, Sammy doubled the initial sales forecasts for the new flagship and wants to sell 40 million of them in the first three months. So instead of risking the chip supply shortages, they are now turning to Qualcomm for Snapdragon 600 CPU, which was initially slated to go mostly to U.S. versions of SGS4.
Taking a step back to fix the production and performances issues of one of the most important parts in your flagship device, is a smart thing to do. If you launch your new top of the line phone with serious quality issues, the initial bad press can be fatal to your plans to sell 100 million them over the product lifecycle.
Going with tried and true chip like Snapdragon 600, that you know will perform as it should, is the best way for Samsung for now. Especially since most of the users won’t notice the difference and won’t care anyway.

Samsung Announces the Availability of Exynos 5 Octa for New Generation of Mobile Devices [press release, March 15, 2013] (internal name: Exynos 5410)

Samsung Electronics Co., Ltd., a world leader in advanced semiconductor solutions, announced that its new Exynos 5 Octa application processor is scheduled for mass-production in the second quarter of 2013.
As highlighted at CES 2013, the Exynos 5 Octa is the world’s first mobile application processor to implement the new concept of processing architecture, big.LITTLE™, based on the Cortex-A15™ CPU to offer optimal core use. By housing a total of eight cores to draw from—four powerful Cortex-A15™cores for processing-intense tasks along with Cortex-A7™ quad cores for lighter workloads—the Exynos 5 Octa enables mobile devices to achieve maximum performance. This approach offers up to 70 percent energy saving when performing various tasks, compared to using Cortex- A15™cores only.

The newest Exynos processor will be manufactured using Samsung’s latest 28-nanometer (nm) HKMG (High-k Metal Gate) low power process and power-saving design, which increases the power efficiency of the processor by minimizing the static current leakage.

The Samsung Exynos 5 Octa enhances the powerful 3D graphics processing capabilities by more than two-times over the Exynos 4 Quad.
With today’s advanced display technology transitioning toward ever higher and sharper resolutions, the Exynos 5 Octa is powerful enough to drive WQXGA (2560×1600) display, the best crystal-clear resolution currently available for mobile devices, enabling users to enjoy crisper video images on their premium smartphones and tablets.
By adopting e-MMC (embedded multimedia card) 5.0 and USB 3.0 interface for the first time in the industry, the new Exynos application processor boasts fast data transfer speed, a feature that is increasingly required to support advanced processing power on mobile devices so that users can fully experience upgraded mobile computing such as faster booting, web browsing and 3D game loading.
The Samsung Exynos 5 Octa incorporates a full HD 60fps (frame per second) video hardware codec engine for 1080p video recording and play-back, an embedded 13 mega-pixel 30fps image signal processor interface for high-quality camera functionality, and 12.8GB/s memory bandwidth interface that enables Full HD Wifi display.

Samsung Exynos at MWC 2013: Exynos 5 Octa Explained [SamsungExynos YouTube channel, March 14, 2013]

This animated display for the Exynos 5 Octa mobile processor was featured in the Samsung Exynos booth at Mobile World Congress 2013. Samsung’s Exynos 5 Octa is the industry’s first ARM® big.LITTLE™-enabled mobile application processor (AP). The Exynos 5 Octa pairs ultra-efficient ARM® Cortex™-A7 (LITTLE) cores with Cortex™-A15 (big) cores designed for the highest performance. This new system-on-chip (SoC) uses LITTLE cores to handle tasks like emailing, light web search and map navigation and uses the big cores for heavy-duty applications like graphic-intensive gaming. Find out more about how Samsung Exynos is driving the discovery of what’s possible: http://www.samsung.com/global/business/semiconductor/minisite/Exynos/index.html

ARM® Big.LITTLE™ Technology Demo on Exynos 5 Octa Reference Tablet at MWC 2013 [SamsungExynos YouTube channel, March 19, 2013]

ARM’s Eric Gowland demoed ARM® big.LITTLE™ processing technology on an Exynos 5 Octa reference tablet in the ARM booth at Mobile World Congress 2013. Gowland showed us the big.LITTLE-enabled Exynos 5 Octa reference platform running a series of benchmarks for tablet activities like web browsing, video playback, graphics rendering and map navigation. In addition to displaying the CPU migration as the processor switched between activities, the demo showed the relative energy usage throughout, highlighting the extreme power efficiency of big.LITTLE architecture. To learn more about ARM® big.LITTLE™ technology, visit our MWC 2013 webpage: http://www.samsung.com/global/business/semiconductor/minisite/Exynos/index.html You can also find more information on ARM’s specialized microsite:http://thinkbiglittle.com/

Samsung Exynos at MWC 2013: Low-Power High K Metal Gate (HKMG) Process Technology [SamsungExynos YouTube channel, March 14, 2013]

Samsung’s Low-Power High K Metal Gate (HKMG) advanced process technology was featured in this animated display inside the Exynos booth at Mobile World Congress 2013. It demonstrates the progression in process technology from 90nm to 28nm, which has resulted in greater speeds and energy-efficiency in Exynos mobile application processors (APs) developed with the technology. For example, the Exynos 5 Octa can offer up to 70% in energy savings thanks to Samsung’s HKMG process. To learn more about Samsung’s HKMG advanced process technology, visit our website: http://www.samsung.com/global/business/semiconductor/minisite/Exynos/index.html

big.LITTLE Processing [ARM technology site, March 20, 2013] [Linaro internal: IKS [In Kernel Switcher]

ARM big.LITTLE™ processing is an energy saving technology where the highest performance ARM CPUs are combined with the most efficient ARM CPUs in a combined processor subsystem to deliver greater performance at lower power than today’s best-in-class systems. With big.LITTLE processing, software workloads are dynamically and instantly transitioned to the appropriate CPU based on performance needs. This software load balancing is so fast that it is completely seamless to the user. By selecting the optimum processor for each task, big.LITTLE can reduce energy consumption in the processor by 70% or more on light workloads and background tasks, and by 50% for moderately intense work, while still delivering the peak performance of the high performance cores.

More information can be found below or on the Think big.LITTLE microsite

Software

Software can control the allocation of threads of execution to the appropriate core, or in some versions of the software simply move the whole processor context up to big or down to LITTLE based on measured load. There are two software approaches to handling the CPU selection decision, described below. In both software approaches, cache coherence is required to enable the software to quickly move execution from LITTLE to big and from big to LITTLE as appropriate. Cache coherence allows one CPU cluster to look up in the caches of the other CPU cluster, and full hardware cache coherence between the two clusters is key to making big.LITTLE software fast and transparent. Cache coherence can be provided by the ARM CCI-400 cache coherent interconnect or any interconnect that follows the AMBA4 ACE protocol.             

In a big.LITTLE SoCs, the OS kernel dynamically and seamlessly moves tasks between the ‘big’ and ‘LITTLE’ CPUs. In reality this is an extension of the operating system power management software in wide use today on mobile phone SoCs.  

Most OS kernels already support Symmetric Multi-core Processing (SMP) and those techniques can easily be extended to support big.LITTLE systems. There are two main variants of big.LITTLE software scheduling.

big.LITTLE CPU Migration [Linaro internal: IKS (In Kernel Switcher) or simply the big.LITTLE.Switcher project]

In CPU migration a whole workload of a CPU gets move to a differently CPU, once the OS detects it requires more or less performance. This builds on generic techniques in an OS to wake up and put to sleep CPUs in an SMP system. The key extension is around the detection that a CPU is running at maximum frequency while still requesting further performance and thus the workload needs to be moved to a ‘bigger’ CPU. Once the workload has reduced, it can moved back to a ‘smaller’ CPU. 

image

This CPU migration software is available today from Linaro [was released to Linaro partners on Dec 20, 2012 as part of Linaro 12.2 release], and is being actively developed by multiple ARM partners [while Linaro continues to fix bugs on it].

big.LITTLE MP [the final name now is Global Task Scheduling (GTS)]

Task migration (aka big.LITTLE MP [as in the Linaro internal project]) detects a high intensity task and will schedule that onto a ‘big’ CPU. Similarly it will detect a low intensity task and move this back to a ‘LITTLE’ core.

image

The advantage of task migration over CPU migration is that a system can benefit from all its CPU at the same time, if the processing demands are extremely high. For example in a 2x ‘big’ + 2x ‘LITTLE’ system all 4 CPUs can be used at peak demand times, where as CPU migration would only be able to use 2 CPUs. 

[According to Vincent Guittot at Linaro Connect 2013 (March 4 –8) in Hong Kong Linaro will release mid of 2013 the big.LITTLE MP prototype for external testing]

ARM and Linaro have been developing Linux support for both migration models. For more information go to:

Embedded Linux Conference 2013 – In Kernel Switcher [IKS]: A Solution [TheLinuxFoundation YouTube channel, recorded Feb 22, published March 1, 2013], slides are downloadable in PDF format

The Linux Foundation Embedded Linux Conference 2013 In Kernel Switcher: A Solution to Support ARM’s New big.LITTLE Implementation By Matheiu Poirer San Francisco, California The ‘In Kernel Switcher’ (IKS) is a solution developed by Linaro and ARM to support ARM’s new big.LITTLE implementation. It is pairing together an A7 (LITTLE) and an A15 (big) processor into a logical entity that is then presented to the kernel as one CPU. From there the solution is seeking to achieve optimal performance and power consumption by switching between the big or the LITTLE core based on system usage. This session will present the IKS solution. After giving an overview of the big.LITTLE processor we will present the solution itself, how frequencies are masqueraded to the cpufreq core, the steps involved in doing a “switch” between cores and some of the optimisation made to the interactive governor. The session will conclude by presenting the results that we obtained as well as a brief overview of Linaro’s upstreaming plan.

ELC: In-kernel switcher [IKS] for big.LITTLE [LWN.net, Feb 27, 2013]

The ARM big.LITTLE architecture has been the subject of a number of LWN articles (here’s another) and conference talks, as well as a fair amount of code. A number of upcoming systems-on-chip (SoCs) will be using the architecture, so some kind of near-term solution for Linux support is needed. Linaro’s Mathieu Poirier came to the 2013Embedded Linux Conference to describe that interim solution: the in-kernel switcher.
Two kinds of CPUs
Big.LITTLE incorporates architecturally similar CPUs that have different power and performance characteristics. The similarity must consist of a one-to-one mapping between instruction sets on the two CPUs, so that code can “migrate seamlessly”, Poirier said. Identical CPUs are grouped into clusters.
The SoC he has been using for testing consists of three Cortex-A7 CPUs (LITTLE: less performance, less power consumption) in one cluster and two Cortex-A15s (big) in the other. The SoC was deliberately chosen to have a different number of processors in the clusters as a kind of worst case to catch any problems that might arise from the asymmetry. Normally, one would want the same number of processors in each cluster, he said.
The clusters are connected with a cache-coherent interconnect, which can snoop the cache to keep it coherent between clusters. There is an interrupt controller on the SoC that can route any interrupt from or to any CPU. In addition, there is support in the SoC for I/O coherency that can be used to keep GPUs or other external processors cache-coherent, but that isn’t needed for Linaro’s tests.
The idea behind big.LITTLE is to provide a balance between power consumption and performance. The first idea was to run CPU-hungry tasks on the A15s, and less hungry tasks on the A7s. Unfortunately, it is “hard to predict the future”, Poirier said, which made it difficult to make the right decisions because there is no way to know what tasks are CPU intensive ahead of time.
Two big.LITTLE approaches
That led Linaro to a two-pronged approach to solving the problem: Heterogeneous Multi-Processing (HMP) and the In-Kernel Switcher (IKS). The two projects are running in parallel and are both in the same kernel tree. Not only that, but you can enable either on the kernel command line or switch at run time via sysfs.
With HMP, all of the cores in the SoC can be used at the same time, but the scheduler needs to be aware of the capabilities of the different processors to make its decisions. It will lead to higher peak performance for some workloads, Poirier said. HMP is being developed in the open, and anyone can participate, which means it will take somewhat longer before it is ready, he said.
IKS is meant to provide a “solution for now”, he said, one that can be used to build products with. The basic idea is that one A7 and one A15 are coupled into a single virtual CPU. Each virtual CPU in the system will then have the same capabilities, thus isolating the core kernel from the asymmetry of big.LITTLE. That means much less code needs to change.
Only one of the two processors in a virtual CPU is active at any given time, so the decision on which of the two to use can be made at the CPU frequency (cpufreq) driver level. IKS was released to Linaro members in December 2012, and is “providing pretty good results”, Poirier said.
An alternate way to group the processors would be to put all the A15s together and all the A7s into another group. That turned out to be too coarse as it was “all or nothing” in terms of power and performance. There was also a longer synchronization period needed when switching between those groups. Instead, it made more sense to integrate “vertically”, pairing A7s with A15s.
For the test SoC, the “extra” A7 was powered off, leaving two virtual CPUs to use. The processors are numbered (A15_0, A15_1, A7_0, A7_1) and then paired up (i.e. {A15_0, A7_0}) into virtual CPUs; “it’s not rocket science”, Poirier said. One processor in each group is turned off, but only the cpufreq driver and the switching logic need to know that there are more physical processors than virtual processors.
The virtual CPU presents a list of operating frequencies that encompass the range of frequencies that both A7 and A15 can operate at. While the numbers look like frequencies (ranging from 175MHz to 1200MHz in the example he gave), they don’t really need to be as they are essentially just indexes into a table in the cpufreq driver. The driver maps those values to a real operating point for one of the two processors.
Switching CPUs
The cpufreq core is not aware of the big.LITTLE architecture, so the driver does a good bit of work, Poirier said, but the code for making the switching decision is simple. If the requested frequency can’t be supported by the current processor, switch to the other. That part is eight lines of code, he said.
For example, if virtual CPU 0 is running on the A7 at 200MHz and a request comes in to go to 1.2GHz, the driver recognizes that the A7 cannot support that. In that case, it decides to power down the A7 (which is called the outbound processor) and power up the A15 (inbound). There is a synchronization process that happens as part of the transition so that the inbound processor can use the existing cache. That process is described in Poirier’s slides [PDF], starting at slide 17.
The outbound processor powers up the inbound and continues executing normal kernel/user-space code until it receives the “inbound alive” signal. After sending that signal, the inbound processor initializes both the cluster and interconnect if it is the first in its cluster (i.e. the other processor of the same type, in the other virtual CPU is powered down). It then waits for a signal from the outbound processor.
Once the outbound processor receives “inbound alive” signal, the blackout period (i.e. time when no kernel or user code is running on the virtual CPU) begins. The outbound processor disables interrupts, migrates the interrupt signals to the inbound processor, then saves the current CPU context. Once that’s done, it signals the inbound processor, which restores the context, enables interrupts, and continues executing from where the outbound processor left off. All of that is possible because the instruction sets of the two processors are identical.
As part of its cleanup, the outbound processor creates a new stack for itself so that it won’t interfere with the inbound. It then flushes the local cache and checks to see if it is the last one standing in its cluster; if so, it flushes the cluster cache and disables the cache-coherent interconnect. It then powers itself off.
There are some pieces missing from the picture that he painted, Poirier said, including “vlocks” and other mutual exclusion mechanisms to handle simultaneous desired cluster power states. Also missing was discussion of the “early poke” mechanism as well as code needed to track the CPU and cluster states.
Performance
One of Linaro’s main targets is Android, so it used the interactive power governor for its testing. Any governor will work, he said, but will need to be tweaked. A second threshold (hispeed_freq2) was added to the interactive governor to delay going into “overdrive” on the A15 too quickly as those are “very power hungry” states.
For testing, BBench was used. It gives a performance score based on how fast web pages are loaded. That was run with audio playing in the background. The goal was to get 90% of the performance of two A15s, while using 60% of the power, which was achieved. Different governor parameters gave 95% performance with 65% of the power consumption.
It is important to note that tuning is definitely required—without it you can do worse than the performance of two A7s. “If you don’t tune, all efforts are wasted”, Poirier said. The interactive governor has 15-20 variables, but Linaro mainly concentrated on hispeed_load and hispeed_freq (and the corresponding*2 parameters added for handling overdrive). The basic configuration had the virtual CPU run on the A7 until the load reached 85%, when it would switch to the first six (i.e. non-overdrive) frequencies on the A15. After 95% load, it would use the two overdrive frequencies.
The upstreaming process has started, with the cluster power management code getting “positive remarks” on the ARM Linux mailing list. The goal is to upstream the code entirely, though some parts of it are only available to Linaro members at the moment. The missing source will be made public once a member ships a product using IKS. But, IKS is “just a stepping stone”, Poirier said, and “HMP will blow this out of the water”. It may take a while before HMP is ready, though, so IKS will be available in the meantime.

Exynos Octa and why you need to stop the drama about the 8 cores [XDA Developers, March 15, 2013]

I’m going to write this as an guide/information page so we stop as soon as possible the stupid discussions about how 8 cores are useless.
What’s it all about?
The Exynos Octa or Exynos 5410 is a big.LITTLE design engineered by ARM and is the first consumer implementation of this technology. Samsung was their lead partner in terms of bringing this to market first. Reneseas is the other current chip designer who has publicly announced a big.LITTLE design.
    • Misconception #1: Samsung didn’t design this, ARM did. This is not some stupid marketing gimmick.

      The point of the design is to meld the advantages of the A7 processor architectures, with its extreme power efficiency, with the A15 architecture, with extreme performance at a cost of power consumption. The A7 cores are slightly slower than an A9 equivalent, but using much less power. The A15 cores are in another ballpark in terms of performance but their power consumption is also extreme on this current manufacturing generation.
      The effective goal is to achieve the best of both worlds. Qualcomm on the other does this by using their own architecture which is similar in some design aspects to the A15 architecture, but compromises on feature and performance to achieve higher power efficiency. The end result is for the user can be expressed in 2 measurements: IPC (Instrucitons per clock), and Perf/W (Performance per Watt).
      In terms of IPC, the A15 leads the pack by quite a margin, followed by Krait 400, Krait 300, Krait 200, A9, A7, and A8 cores, in that order.
      In terms of Perf/W, the A7 leads by a margin, followed by A9’s and the Krait cores, with the A15 at a distant last in terms of efficiency.
      Real-world use
      Of course, the Exynos Octa is the first to use this:

      image

      Currently, the official word seems to be that the A7 cluster is configured to run from 200 to 1200MHz, and the A15 cluster from 200 to 1600MHz.
      There are several use-cases of how the design can be used, and it is purely limited by software, as the hardware configuration is completely flexible.
      In-Kernel Switcher (IKS)
      This is what most of us will see this in our consumer products this year; Effectively, you only have a virtual quad-core processor. The A15 cores are paired up with the A7 core clusters. Each A15 has a corresponding A7 “partner”. Hardware wise, this pair-up has no physical representation as provided by an actual die-shot of the Exynos Octa.
      The IKS does the same thing as a CPU governor. But instead of switching CPU frequency depending on the load, it will switch between CPUs.

      image    image

      Effecively, you are jumping from one performance/power curve to another: And that’s it. Nothing more, nothing less.
      The actual implementation is a very simple driver on the side of the kernel which measures load and acts much like a CPU governor.
      [PhoneArena YouTube channel, Feb 25, 2013] For more details, check out our web site:http://www.phonearena.com/ PhoneArena presents a video demonstration of the new Samsung Exynos 5 Octa chipset – the manufacturer’s first octa-core processor! As you can imagine, the Exynos 5 Octa is very new and not available in any handset yet, but we expect it to make an appearance in the Galaxy S IV! So, it’s definitely worth checking!
      The above is a demonstration; you can see how at most times the A7 cores are used for video playback, simple tasks, and miscellaneous computations. The A15 cores will kick in when there is more demanding load being processed, and then quickly drop out again to the A7 cores when it’s not doing much anymore.
      • Misconception #2: You DON’T need to have all 8 cores online, actually, only maximum 4 cores will ever be online at the same time.
      • Misconception #3: If the workload is thread-light, just as we did hot-plugging on previous CPUs, big.LITTLE pairs will simply remain offline under such light loads. There is no wasted power with power-gating.
      • Misconception #4: As mentioned, each pair can switch independently of other pairs. It’s not he whole cluster who switches between A15 and A7 cores. You can have only a single A15 online, together with two A7’s, while the fourth pair is completely offline.
      • Misconception #5: The two clusters have their own frequency planes. This means A15 cores all run on one frequency while the A7 cores can be running on another. However, inside of the frequency planes, all cores run at the same frequency, meaning there is only one frequency for all cores of a type at a time.
      Heterogeneous Multi-Processing (HMP)
      This is the other actual implemented function mode of a big.LITTLE CPU. In this case, all 8 cores can be used simultaneously by the system.
      This is a vastly more complex working mechanism, and its implementation is also an order of magnitude more sophisticated. It requires the kernel scheduler to actually be aware of the differentiation of between the A7 and A15 cores. Currently, the Linux kernel is not capable of doing this and treats all CPUs as equals. This is a problem since we do not want to use the A15 cores when a task can simply me processed on an A7 core with a much lower power cost.
      The Linaro working-group already finished the first implementation of the HMP design as a series of patches to be applied against the Linux 3.8 kernel. What they did is to make the scheduler smart enough to be able to track the load of single process entities, and with that information to schedule the threads smartly on either the A7 cores or the A15 cores. This achieves much lower latency in terms of switching workloads, or better said, switching the environments (CPUs) to the respective work-loads, and exposes the full processing capabilities of the silicon as all cores can be used at once.
      You can follow the advancements of this in the publications of the Linaro Connect summits that happen every few months. The code was only published in the middle of February this year for the first working implementation equivalent in power consumption to the IKS.
      • Misconception #6: Yes the CPU is a true 8-core processor. It’s just not being used as such in its initial software implementations

      big.LITTLE In Kernel Switcher [IKS] by Nicolas Pitre and Viresh Kumar [Charbax YouTube channel, March 16, 2013]

      Nicolas Pitre and Viresh Kumar are part of the core team from Linaro that is working on developing future solutions for the latest ARM architecture: big LITTLE. Here they discuss some of the internals of the famous IKS solution. They are joined by Naresh Kamboju who is part of QA team working for Linaro. This team including few more got “Outstanding team for 2012 for their work on IKS”. Filmed at Linaro Connect 2013 [March 4-8] in Hong Kong.

      Vincent Guittot on the Linaro big.LITTLE MP work [Charbax YouTube channel, April 1, 2013]

      Vincent Guittot, Linaro assignee from ST-Ericsson, talks about the work that is being done at Linaro to Extend the Linux kernel to support ARM’s big.LITTLE MP architecture, building on the features provided by the big.LITTLE Switcher project. The most powerful use model of big.LITTLE is called MP and enables the use of all physical cores at the same time. Threads with high priority and/or computationally intensive can in this case be allocated to the A15 cores while threads with less priority or less computationally intensive such as background tasks can be performed by the A7 cores. Filmed at Linaro Connect 2013 [March 4-8] in Hong Kong.
      Advertisements

      4 Comments

      1. […] Perspective information on that is already available on my ‘Experiencing the Cloud’ blog here:- The state of big.LITTLE processing [April 7, 2013]- The future of mobile gaming at GDC 2013 and elsewhere [April 6, 2013]- TSMC’s […]

      2. […] for understanding a similar SoC from the competition, as well with “What is new vs. my earlier The state of big.LITTLE processing [‘Experiencing the Cloud’, April 7, 2013] report” section in the end of it- 20 years of […]

      Leave a Reply

      Fill in your details below or click an icon to log in:

      WordPress.com Logo

      You are commenting using your WordPress.com account. Log Out / Change )

      Twitter picture

      You are commenting using your Twitter account. Log Out / Change )

      Facebook photo

      You are commenting using your Facebook account. Log Out / Change )

      Google+ photo

      You are commenting using your Google+ account. Log Out / Change )

      Connecting to %s

      %d bloggers like this: