Home » servers » Intel targeting ARM based microservers: the Calxeda case

Intel targeting ARM based microservers: the Calxeda case

Prerequisites (June 2015⇒):

Welcome to technologies trend tracking for 2015⇒2019 !!! v0.7
5G: 2015⇒2019 5G Technologies for the New Era of Wireless Internet of the 2020’s and 2030’s
Networked Society—WTF ??? v0.5
Microsoft Cloud state-of-the-art v0.7
• Service/telco for Networked Society
• Cloud for Networked Society
• Chrome for Networked Society
• Windows for Networked Society

Opportunity for Microsoft and its Partners in FY17:

As progressed since FY15:

Or enter your email address to subscribe to this blog and receive notifications of new posts by email:

Join 93 other followers

2010 – the 1st grand year of:

3.5G...3.9G level mobile Internet
• system-on-a-chip (SoC) and
reflective display technologies

Why viewed most (till Feb 1):

Marvell SoC leadership
Android 2.3 & 3.0
Hanvon's strategy
Welcome! or Home pages
Treesaver (LATELY #2!) and
IMT-Advanced (4G)
MORE ON THE STATISTICS PAGE

Core information:

  • Intel Atom processor S1200 vs. Calxeda ECX1000 for microservers
  • ARM Holdings on the server opportunity
  • x86 on ARM with Linux
  • Boston Ltd. related information from Calxeda
  • Background on Elbrus (in Russian or English if available)
  • Background on Elbrus Technologies (in Russian)

See also: 
Binary translation [Wikipedia, Sept 2, 2012]
Calxeda [Wikipedia, Nov 12, 2012]
Microserver (Server appliance) [Wikipedia, Nov 5, 2012]


Intel Atom processor S1200 vs. Calxeda ECX1000 
for microservers

Intel Delivers the World’s First 6-Watt Server-Class Processor [Intel press release, Dec 11, 2012]

Several Equipment Makers Building Microservers, Storage and Networking Systems Based on 64-bit Intel® Atom™ Processor S1200 Product Family

NEWS HIGHLIGHTS

  • Intel® Atom™ processor S1200 server system on-chip hits lower-power levels, and includes key features such as error code correction, 64-bit support, and virtualization technologies required for use inside data centers.
  • More than 20 low-power designs including microservers, storage and networking systems use the Intel Atom processor S1200 family.


Intel Corporation introduced the Intel® Atom™ processor S1200 product family today, delivering the world’s first low-power, 64-bit server-class system-on-chip (SoC) for high-density microservers, as well as a new class of energy-efficient storage and networking systems. The energy-sipping, industrial-strength microprocessor features essential capabilities to achieve server-class reliability, manageability and cost effectiveness.

“The data center continues to evolve into unique segments and Intel continues to be a leader in these transitions,” said Diane Bryant, vice president and general manager of the Datacenter and Connected Systems Group at Intel. “We recognized several years ago the need for a new breed of high-density, energy-efficient servers and other datacenter equipment. Today, we are delivering the industry’s only 6-watt1 SoC that has key datacenter features, continuing our commitment to help lead these segments.”

Intel’s Next Generation of Microservers: The Real Thing

As public clouds continue to grow, the opportunity to transform companies providing dedicated hosting, content delivery or front-end Web servers are also growing. High density servers based on low-power processors are able to deliver the desired performance while at the same time significantly reduce the energy consumption – one of the biggest cost drivers in the data center. However, before deploying new equipment in data centers, companies look for several critical features.

The Intel Atom processor S1200 product family is the first low-power SoC delivering required data center features that ensure server-class levels of reliability and manageability while also enabling significant savings in overall costs. The SoC includes two physical cores and a total of four threads enabled with Intel® Hyper-Threading Technology2 (Intel® HT). The SoC also includes 64-bit support, a memory controller supporting up to 8GB of DDR3 memory, Intel® Virtualization Technologies (Intel® VT), eight lanes of PCI Express 2.0, Error-Correcting Code (ECC) support for higher reliability, and other I/O interfaces integrated from Intel chipsets. The new product family will consist of three processors with frequency ranging from 1.6GHz to 2.0GHz.

The Intel Atom S1200 product family is also compatible with the x86 software that is commonly used in data centers today. This enables easy integration of the new low-powered equipment and avoids additional investments in porting and maintaining new software stacks.

New Milestones in Power Efficiency

Intel continues to drive power consumption down in its products, enabling systems to be as energy efficient as possible. Each year since the 2006 introduction of low-power Intel® Xeon® processors, Intel has delivered a new generation of low-power processors that have decreased the thermal design power (TDP) from 40 watts in 2006 to 17 watts this year due to Intel’s advanced 22-nanometer (nm) process technology. The Intel Atom processor S1200 product family is the first low-power SoC with server-class features offering as low as 6 watts1 of TDP.

Broad Industry Support

Today, more than 20 low-power designs including microservers, storage and networking systems use the Intel Atom processor S1200 processor family from companies including Accusys*, CETC*, Dell*, HP*, Huawei*, Inspur*, Microsan*, Qsan*, Quanta*, Supermicro* and Wiwynn*.

“Organizations supporting hyperscale workloads need powerful servers to maximize efficiency and realize radical space, cost and energy savings,” said Paul Santeler, vice president and general manager, Hyperscale Business Unit, Industry-standard Servers and Software at HP. “HP servers power many of those organizations, and the Intel Atom processor S1200 will be instrumental as we develop the next wave of application-defined computing to dramatically reduce cost and energy use for our customers.”

An Even Brighter Future

Intel is working on the next generation of Intel Atom processors for extreme energy efficiency codenamed “Avoton.” Available in 2013, Avoton will further extend Intel’s SoC capabilities and use the company’s leading 3-D Tri-gate 22 nm transistors, delivering world-class power consumption and performance levels.

For customers interested in low-voltage Intel® Xeon® processor models for low-power servers, storage and networking, Intel will introduce the new Intel Xeon processor E3 v3 product family based on the “Haswell” microarchitecture next year. These new processors will take advantage of new energy-saving features in Haswell and provide balanced performance-per-watt, giving customers even more options.

Pricing and Availability

The Intel Atom processor S1200 is shipping today to customers with recommended customer price starting at $54 in quantities of 1,000 units.

More information on the announcement including Diane Bryant’s presentation, additional documents and pictures are available at http://newsroom.intel.com/docs/DOC-3172.

Fact Sheets & Backgrounders

See also:
Intel® Atom™ Processor S1200 for Microserver: Datasheet, Vol. 1 [Intel, Dec 2012]

Comparing Calxeda ECX1000 to Intel’s new S1200 Centerton chip [‘ARM Servers Now’ blog from Calxeda, Dec 11, 2012]

Based on what Intel disclosed today,  here’s a snapshot of Calxeda EnergyCore 1000 vs. Intel’s new S1200 chip

 
ECX1000
Intel S1200
Watts
3.8
6.1
Cores
4
2
Cache (MB)
4 Shared
2 x .5 MB
PCI-E
16 lanes
8 lanes
ECC
Yes
Yes
SATA
Yes
No
Ethernet
Yes
No
Management
Yes
No
OOO Execution
Yes
No
Fabric Switch
80 Gb
NA
Fabric ports
5
NA
Address Size
32 bits
64 bits
Memory Size
4 GB
8 GB

So, while the Centerton announcement indicates that Intel takes “microservers” seriously after all, it falls short of the ARM competition. It DOES have 64-bits and Intel ISA compatibility, however. Most workloads targeting ARM are interpreted code (PHP, LAMP, Java, etc), so this is not as big a deal as some would have you believe! Intel did not specify the additional chips required to deliver a real “Server Class” solution like Calxeda’s, but our analysis indicates this could add  10 additional watts PLUS the cost. That would imply the real comparison is between ECX and S1200 is ~3.8 vs ~16 watts. So roughly 3-4 times more power for Intel’s new S1200, again, comparing 2 cores to 4. Internal Calxeda benchmarks indicate that Calxeda’s four cores and larger cache delivery 50% more performance compared to the 2 hyper-threaded Atom cores. This translates to a Calxeda advantage of 4.5 to 6 times better performance per watt, depending on the nature of the application.

What is a “Server-Class” SOC? [‘ARM Servers Now’ blog from Calxeda, Dec 12, 2012]

As reported in various outlets yesterday, Intel has released their S1200 line of Atom SOC’s targeting the microserver market with the tagline: “Intel Delivers the World’s First 6-Watt Server-Class Processor”. The first notable point here is that they had to use 6 Watts, because 5 was already taken. The second notable point is their definition of “Server-Class”. Looking at the list of features on the Atom S1200, there are key “Server-Class” features missing:

  • Networking: Intel’s SOC requires you to add hardware for networking
  • Storage: Once again, there is no SATA connectivity included on the Intel SOC, so you must add hardware for that
  • Management: Even microservers need remote manageability features, so again with Intel you need to tack that on to the power and price budgets.

Unless you add additional hardware on top of it, Intel’s SOC allows you to boot and not much else. Let’s also consider the fact that you’ve got a total of 8 lanes of PCI Express Gen 2 on each SOC. If you’d like to add the Server-Class items listed above, choose wisely, because those 8 lanes will go fast. Add all of that hardware, plus memory, and 6 W is simply not possible.  And of course these additional components add cost and take space as well.

Let’s expand that thought to an actual Atom S1200 powered system, like the Quanta S900- X31A. Each node includes a Marvell 88SE9130 SATA controller at a TDP of 1W, an Intel i350 1GB controller at 2.8W TDP, an AST2300M estimated at a conservative 1W, and an SODIMM at roughly 1.2W (Using the same number we at Calxeda have used). That adds at least 6 more watts per node, almost doubling the 6.1W TDP of the processor. Multiply that across 48 nodes and you just tacked on 288W to each chassis. In a 42U rack full of them, you just added 4kW to each rack! By no means is that a limitation or shortcoming of the Quanta design, which is actually quite good, but rather an indication of the excess baggage that all vendors will need to deal with in putting together an S1200 powered system.

The [Ultimate Data X1 (UDX1) system from Penguin Computing] currently [the Viridis from Boston] shipping [the SystemFabricCore from System Fabric Works] Calxeda ECX-1000 Server-Class SOC ships with SATA, Ethernet fabric links, IPMI-based management, and 8 lanes of PCI Express Gen 2, standard at 3.8W (5W including 4GB DDR3). It’s also worth pointing out that Calxeda’s integrated fabric switch provides more than just the Ethernet ports missing on the Atom S1200.  Applied at the system and rack level, it can dramatically reduce Top of Rack Switch ports and cabling complexity, while increasing internode bandwidth by 10-fold.  You can have all of that in a 5W server. Not 5W + additional components. Why not take that 12W budget you need for each S1200 node and get two Calxeda nodes with all of the Server-Class features included?

In the end, Intel may simply be claiming 64-bit as the main benchmark for Server-Class. When matching microservers to the appropriate workloads, we’ve found that there is surely a place for 32-bit in the datacenter. We’ll be providing a blog post on that very topic in the near future.

Penguin Computing’s New High Density System Ultimate Data X1 Brings ARM’s Low-power Footprint To The Data Center [Penguin Computing press release, Oct 17, 2012]

Penguin Computing today announced the immediate availability of its Ultimate Data X1 (UDX1) system. The UDX1 is the first server platform offered by a North American system vendor that is built on the ARM®-based EnergyCore System on Chip (SoC) from Calxeda.

The UDX1 brings new levels of efficiency and scale to internet datacenters. With a five Watt power envelope per server the UDX1 is ideal for I/O bound workloads including “Big Data” applications, scalable analytics and cloud storage. The UDX1 offers a drastic reduction of TCO for high-density, low power computing environments. Workloads that have been processed by racks of conventional systems can now be handled by a group of servers in a single physical unit. The UDX1 features a modular architecture that can be configured with up to 48 Calxeda EnergyCore server nodes, with four cores per node. The system includes an internal 10 Gigabit Ethernet switch fabric for node-to-node connectivity and provides up to 144TB of hard drive capacity.

“Power and cooling are the biggest facility challenges for most data centers, on the other hand typical cloud computing, web 2.0 and ‘Big Data’ applications are based on scale out architectures,” said Charles Wuischpard, CEO of Penguin Computing.“A new generation of power efficient high density servers is required to run these workloads efficiently. With the incredibly low power envelope and the extremely high density Calxeda’s EnergyCore SoCs offer, the UDX1 is the ideal platform for running these types of workloads.”

“Penguin is an innovator in Linux based solutions for internet datacenters and high performance computing. We are thrilled that their next generation of innovative products includes Calxeda,” said Barry Evans, CEO Calxeda. “We are realizing this new era in breakthrough low power computing that will lift the constraints on datacenter performance and efficiency. Penguin is helping chart this course with an ideal solution to span from scale-out cloud storage to analytics.”

Penguin Computing will be showing a live demo of Hadoop running on the UDX1 at the upcoming Strata Conference + HadoopWorld on October 24-25 in New York.
For more information, please visit www.penguincomputing.com.

About Penguin Computing

For well over a decade Penguin Computing has been dedicated to delivering complete, integrated Enterprise and High Performance Computing (HPC) solutions that are innovative, cost effective, and easy to use. Penguin offers a complete end-to-end portfolio of products and solutions including workstations, rack-mount servers, custom server designs, power efficient rack solutions and turn-key clusters. Penguin also offers the Scyld suite of software products for efficient provisioning and infrastructure monitoring. For users who want to use supercomputing capabilities on-demand and pay as they go, Penguin provides Penguin Computing on Demand (POD), a public HPC cloud that is available instantly and as needed.

Penguin counts some of the world’s most demanding organizations as its customers, including Yelp, Caterpillar, Life Technologies, Dolby, Lockheed Martin and the US Air Force. Penguin Computing is a registered trademark of Penguin Computing, Inc. Penguin Computing on Demand is a pending trademark in the US. All other trademarks are property of their respective owners. Other product or company names mentioned may be trademarks or trade names of their respective companies.

Boston Viridis – ARM® Microservers [Boston product page, Oct 18, 2012]
It was announced at ISC 2012 on June 13, 2012 with whitepaper released simultaneously.

THE WORLD’S FIRST HYPERSCALE SERVER

Hyperscale Computing represents an inflexion point in the industry that will disrupt the very concept of a server in future systems. Modern servers have come a long way, but they are nonetheless fundamentally based around designs originally created decades ago.

The Boston Viridis is a self contained, highly extensible, 48 node ultra-low power ARM® cluster with integral high-speed interconnect and storage within a standard single 2U rack mount enclosure.

Racks of individually connected, high-power, low density servers and blades are installed in modern data centres thousands at a time. Each of these server systems requires its own networking infrastructure, high power distribution, HVAC, and maintenance engineers to take care of it when things go wrong. These ineffciencies could cost data centres billions.

The Boston Viridis uses Server-on-Chip (SoC) technology to integrate the CPU (powered by ARM®), networking and IO onto the server chip. SoC technology, which began life as an embedded systems technology but is primed to storm the data centre in the next few years allows for mass levels of integration at high density requiring little active cooling. With this technology today we can now con gure over a thousand servers in a standard 42U rack.

The Boston Viridis uses the ARM® -based Calxeda EnergyCore® to create a rack mountable 2U server cluster. The solution comprises of 192 processing cores leading the way towards energy effcient hyperscale computing.

Each 2U chassis contains a total of 12 Calxeda EnergyCards connected to a common mainboard sharing power and fabric connectivity. The Calxeda EnergyCard is a single PCB module containing 4 Calxeda EnergyCore SoCs; each with 4GB DDR-3 Registered ECC Memory, 4 x SATA connectors and management interfaces.

Ethernet switching is handled internally by 80Gb bandwidth on the EnergyCore fabric switch, thereby negating the need for additional switches that consume unnecessary power and add unwanted latency.

Astonishingly, utilising all 48 Calxeda EnergyCore SoCs, the whole package including fabric and management consumes less than 300W – this is achieved as each SoC device consumes just 0.5 to 5 watts of power (depending on load).

With specific applications, the overall combined performance of one 2U Boston Viridis appliance can outperform a whole rack of standard x86 servers, yet at the same time consume 1/10th the power and occupy 1/10th the space making it an excellent investment for datacentres and enterprises alike.

SystemFabriCore [product page from System Fabric Works, Nov 30, 2012]
It was announced at SC12 on Nov 12, 2012 with demonstration at the Calxeda booth.

The First, and Next Step in Hyper-Efficient Computing

The SystemFabriCore is an Ultra Dense, Ultra Low Power Computing Platform based on a revolutionary new approach to highly parallel, densely packaged, tightly integrated systems utilizing the Calxeda EnergyCore™  SOC (System on a Chip) which delivers computing, fabric, network, storage I/O and management, all in one 3.8 watt SOC as opposed to a traditional x86 motherboard based architecture using 100s of watts.

image

The SystemsFabriCore is a self contained, highly extensible, multi-node cluster with integral high- speed interconnect and storage within a standard 2U rackmount enclosure.

  • Available up to 48 SoC components delivered on 12 Calxeda EnergyCard platforms

  • Each Calxeda EnergyCore™ SoC contains a quad-core processing unit, providing a total of 192 cores per 2U enclosure
  • 24 x 2.5” SATA HDDs or SSD devices
  • 
4 x 4GB miniDIMM modules per EnergyCard, providing a total of 192GB of RAM per 2U enclosure
  • Rear I/O supporting 4 x SFP+ cages for external fabric connectivity (1Gbe or 10Gbe depending on configuration and number of Energy Cards) and 1 x serial port for management.

image

FEATURES:

  • Easily scalable to thousands of nodes
  • Calxeda EnergyCore™ SoC Redefines Big Data Efficiency
  • Each EnergyCore™ contains an ARM® Cortex™-A9 Quad-core CPU
  • Up to 10X the performance in the same power and space
  • Cuts energy use and space by up to 90%
  • Industry leading low power consumption 3.8 watts per SoC
  • Up to 24 SATA HDDs or SSD per 2U
  • Up to 192GB of RAM per 2U enclosure
  • Total of 192 cores per 2U

SystemFabriCore Datasheet


ARM Holdings on the server opportunity

ARM in Servers: Taming Big Data with Calxeda @ ISC’12 [ARM Holdings’ Smart Connected Devices blog, June 18, 2012]

I spent a number of years working in High Performance Computing (HPC) and found it to be one of the most innovative communities I’ve had the pleasure to work with. That’s why I’m certain they’re going to be excited to see and hear what Calxeda, an ARM® Connected Community®partner, has to offer at ISC’12 this week! Spoiler alert: they’ll be sharing some new performance and Total Cost of Ownership (TCO) data that shows just how compelling a right-sized solution can be for the target workloads. And what do I mean by ‘right-sized’ solution? More on that in a moment…

First, I’d like to offer kudos to the HPC community for tackling some of the largest and most complex problems known. Unsung heroes in so many aspects of our everyday life – for example, have you ever wondered how cars continue to get safer and more efficient each year? (Hint: they use lots of computers to model and simulate scenarios to improve safety and efficiency.) Similar techniques are used to uncover new medicines, forecast weather, identify new energy sources and predict future environmental impacts to name just a few. Then there’s ‘Big Data’ which applies HPC-like techniques to mine the ever-increasing sources and quantities of unstructured data (search queries, social media, financial transactions, crime reports, live traffic, smart meters etc…) for seemingly unrelated but extremely interesting (read: valuable) patterns and insight.

To tackle a large project, you typically break it down into smaller manageable chunks. In the case of HPC and Big Data, that means decomposing and distributing data across many servers (think hundreds and in some cases thousands or even tens of thousands), then collecting and consolidating the results into an overall ‘solution.’ Today, this is typically performed using a technique such as MapReduce enabled by software from companies like Cloudera, Datastax, MapR and Pervasive running on a cluster of general-purpose servers connected via high-performance networks. Often the compute requirements are somewhat modest relative to the enormity of the data, meaning unimpeded data movement is fundamental to overall efficiency.

With that as a backdrop, think for a moment – “how would you architect highly efficient servers for this purpose if you had a clean slate?” ARM’s business model enables innovative companies the freedom and choice to do just that, resulting in highly efficient and targeted solutions.

As stated before, one size no longer fits all.

To achieve a step function in efficiency, often requires new thinking. In the case of data intensive computing, re-balancing or ‘right-sizing’ the solution to eliminate bottlenecks can significantly improve overall efficiency. That’s exactly what Calxeda has done with its EnergyCore™ ECX-1000 series processor. By combining a quad-core ARM®Cortex™-A series processor with topology agnostic integrated fabric interconnect (providing up to 50Gbits of bandwidth at latencies less than 200ns per hop), they can eliminate network bottlenecks and increase scalability. EnergyCore also includes all the traditional server I/O, memory and management interfaces you would expect. This ‘just add memory’ server on a chip approach means servers can literally be credit card sized and operate at a power-sipping 5W of total power. That means huge density increases are also possible: –

image

image

Click here for more details on the Calxeda EnergyCore ECX-1000 SoC.

With all this innovation, it’s easy to get caught up in the hardware, but we also need to recognize software plays an important role here. While the ecosystem is coming together quite nicely with Canonical’s Ubuntu Server 12.04LTS release and various open source libraries already available, there’s still much work ahead. As of today, the fundamental pieces are in place to begin doing useful work and key software partners are already engaged with Calxeda on early access hardware. Forthcoming availability of ARM processor-based server systems fromHP and other OEMs will accelerate the next phase of software ecosystem developments.

If you’re at ISC’12 this week and want to know more, be sure to visit Calxeda at booth #410, and check out Karl Freund’s speaking session on the show floor Tuesday, June 19th at 4:15pm. If you’re not at ISC’12 we’ll also be at SC’12 in November (booth #122.) But trust me you don’t want to be left waiting until then! There are plenty of other opportunities throughout June (including GigaOM Structure 2012 in San Francisco.) And we’ll be announcing more opportunities to meet the Calxeda and ARM teams in the near future so be sure to watch this space!

Jeff Underhill, Server Segment Marketing Manager, ARM, is based in Silicon Valley. After spending 10+ years working in the traditional server market Jeff saw an opportunity to revisit server design and redefine an industry. ARM’s business model enables innovative companies the freedom and choice to ask themselves “how would I architect highly efficient servers if I had a clean slate?” Consequently, he is helping drive ARM’s server program with a view to redefining the boundaries of traditional servers as opposed to simply replacing incumbent platforms.

ARM Cortex-A50: Broadening Applicability of ARM Technology in Servers [ARM Holdings’ Smart Connected Devices blog, Oct 31, 2012]

I have been running the ARM® server initiative for a little over four years. At kickoff, there were few that believed that ARM technology would find its way into server applications. Fast forward to today, more of the strands of the strategy are now in the public domain.

  • 32-bit ARM powered platforms, from companies that include Boston, Dell, HP, Mitac and Penguin Computing (based on either Marvell’s or Calxeda’s EnergyCore system-on-chip devices) are starting to ship into the market. Customers can start to evaluate the performance of their workloads on ARM based servers hosted in the cloud.
  • The initial pieces of the software ecosystem are starting to appear including performance optimized Java compilers/java virtual machines, commercial grade Linux distributions and application stacks.

For companies developing businesses based on web infrastructure, the server IS the business. These companies have honed their software and hardware strategies to enable quick adoption of technologies that drive down system acquisition costs or running costs. Increased use of open source software on a Linux platform reduces the legacy ties to incumbent server platforms and paves the way for more innovation. Companies are now making decisions on system technologies based on metrics like performance (on the user application) / watt / $ or performance / watt / foot3 as opposed to the pure performance.

ARM has consistently indicated that a relatively small set of server applications could take advantage of a 32-bit ARM processor and that the availability of 64-bit ARM devices would significantly broaden the applicability. In the cloud infrastructure space, the main benefit that the 64-bit execution state brings is access to a larger memory address space. 2014 will be the year when we see 64-bit ARM powered server SoCs appearing in the market. Now surely those will all be based on the ARM CortexTM-A57, right? Well, what we have learnt in the server journey is that one size does not fit all. Some server workloads do benefit from a high single thread performance. However, as Brian Jeff notes in his blog [see big.LITTLE in 64-bit, also copied here just below], for applications that have modest compute requirements, the Cortex-A53 processor will deliver the best throughput performance inside a specific power envelope.

We think our cores are a great base for server devices. But as important is the ARM business model which enables our silicon licensees to tightly couple peripherals, memory and processing engines of the same piece of silicon. The selection of this mix of functionality that balances the compute, networking and storage elements for the specific server application is key to driving advantages in the metrics discussed above.

But a chip is useless without software. Earlier this year, ARM released a 64-bit Linux distribution and tools into the open source community. The primary focus of my team is to ensure the multiple commercial grade Linux distributions pick up this technology, augmented with virtualization and application stacks, all in time to intersect silicon availability. Fortunately, we have an early pioneer in in the ARMv8 space. At ARM TechconTM 2011, Applied Micro announced their intent to develop a 64-bit ARM powered server device. ARM demands compatibility between companies that develop their own ARM processors (achieved through an architecture license) and cores that ARM licenses. Software companies are already developing software for use on ARMv8 processors using an FPGA version of the Applied Micro’s X-Gene device. This will be superseded with real silicon, set to appear in the early part of 2013. You can expect to see more announcements about the progress regarding 64-bit server software in the coming quarters.

Some observers remain skeptical as to ARM’s likelihood of success here. My team is immersed daily in this engagement so it is fair to say we are somewhat passionate and evangelical about our chances. What I think we can agree on is that the announcement of the Cortex-A50 series removes a technical barrier that many have argued prevent ARM’s access into the server domain. The list of lead partners of these cores, such as AMD and Calxeda, augmented with the three publically announced ARMv8 architecture licensees (Applied Micro, Cavium and NVIDIA) is an early indicator that choice is coming to the server domain. One size does not fit all. The winners will be those that best deliver relevant, compelling functionality alongside the processor core. A space long devoid of innovation is about to undergo some significant disruption!

Ian Ferguson, Director of Server Systems and Ecosystem, ARM,has spent years fighting from the corner of the underdog. Most of those scars are healing nicely. Ian is particularly passionate about taking ARM technology into new types of applications that do not exist or are at the very formative stages. Consequently, he is driving ARM’s server program with a view to reinvent the way the server function is implemented in networks as opposed to simply replacing incumbent platforms.

big.LITTLE in 64-bit [ARM Holdings’ SoC Design blog, Nov 1, 2012]

With the ARM® CortexTM-A50 series processors, ARM has introduced a “big” and “LITTLE” processor pair that is 64-bit capable. So with this 2nd generation of big.LITTLE platform, what does this mean for big.LITTLE software, which is currently being readied for deployment on ARMv7 32-bit processors? How will big.LITTLE processing technology be used in applications outside mobile like low-power servers, where 64-bit processing is a growing requirement?

Preparing for 64b Operating Systems

To start with, I should highlight that big.LITTLE software operates at the level of the operating system, in kernel space. To be clear, this means it is completely transparent to all apps and middleware. In both the major modes of operation (CPU migration and big.LITTLE MP) (discussed in more detail elsewhere) the software consists of a relatively small patch set to the OS kernel. Today, these patches are written in ARMv7 code, available in the open source or from Linaro. The Cortex-A50 series processors support the AArch32 execution state which is 100% backward compatible with ARMv7, so a Cortex-A50 series big.LITTLE processor can run existing 32-bit kernels without any major changes, including kernels that have been patched to support big.LITTLE. There will be some changes in cache maintenance routines, but effectively the big.LITTLE software is the same.

This is important as we are continuously improving the ARMv7 big.LITTLE code base. The first generation of devices based on big.LITTLE processors expected in the market in 2013.

ARMv8 allows 64-bit and 32-bit operation. AArch64 is the architecture that describes 64-bit mode of operation and AArch32 describes the 32-bit mode of operation. AArch64 also delivers other architectural benefits like enhanced SIMD, larger register files, enhanced cache management, tagged pointers, and more flexible addressing modes. For a big.LITTLE processor to deliver the architectural benefits of AArch64, it must run a 64-bit OS built on AArch64.

ARM 64-bit Linux has already been up-streamed, and ARM has demonstrated Android 32-bit code running (unmodified) on top of the 64-bit Linux kernel. The next step in providing big.LITTLE support in the 64-bit kernel is to modify the big.LITTLE MP and CPU migration patch sets to work cleanly in the AArch64 environment. Fortunately the code is not strongly impacted by register width, and therefore the vast majority should port cleanly and with little effort from ARMv7 to 64 bit; we plan to do this work at ARM and release 64-bit capable patch sets in mid-2013. This lines up well with expected Cortex-A50 based SoCs sampling at the end of 2013 and deployed in products in 2014.

Although we don’t expect 64-bit mobile OS’s to become prevalent that early, the AArch32 mode of the Cortex-A50 series processors will handle the ARMv7 32b OS, and will be ready for the transition to 64-bit when it does occur.

big.LITTLE in the Enterprise?

Originally conceived as an energy savings technique for mobile phones, big.LITTLE can be viewed as an interesting disruptive technology for applications like ARM processor based low-power servers. For servers and networking applications which are generally memory bound, having a large number of efficient processors that are tuned to workload makes a lot of sense. Often this workload leads itself to having multiple cores at different performance levels, but which are software identical.

As performance scales to higher core counts and the system power budgets reduce, the amount of power budget left for the CPU even in enterprise is very similar to that of mobile. Consider a fanless 20-25 W chip that has 16 CPUs, IO devices, a large L3 cache and other accelerators on board. Once you strip out the budgets for the non-CPU portions and split the remaining amongst the 16 CPUs, they budget is very much similar to a mobile phone power budget. big.LITTLE allows system designers to have their cake and eat it by delivering enterprise performance using a mobile pedigree processors and resultant low-cost, fanless device.

The other aspect of big.LITTLE technology that is attractive is the ability to more efficiently support a dynamically varying level of required performance. Infrastructure equipment is typically designed for the peak operating capacity, for example, to support the call volume on Mother’s Day or the mobile internet traffic during the Super Bowl. On most days the traffic is at most half of the peak traffic. An architecture that includes a mix of big and LITTLE cores in the same system, or even on the same die, can be dynamically adapted to the performance needs of the network more efficiently. This leads to better overall power consumption and reducing TCO.

big.LITTLE MP software, which gives the OS full view of all the big and LITTLE processors in the system, can automatically handle the work allocation in such a system. This mode of scheduling is more appropriate to the enterprise use case than CPU migration. CPU migration leverages dynamic voltage and frequency scaling (DVFS) to trigger the move between big and LITTLE cores. This works well in mobile devices which typically employ DVFS, but is not as suitable for enterprise systems which typically do not. Now that big.LITTLE MP has been effectively demonstrated on real silicon, enterprise partners are evaluating how big.LITTLE can help them achieve their performance goals without blowing the power budget.

In servers, the benefits of big.LITTLE are still under investigation. There is tremendous interest in ARM based low-power servers, where even our “big” Cortex-A57 CPU will consume significantly lower power than incumbent solutions. With increasing pressure on OEMs to create power efficient servers, it is clear that high peak performance CPUs do not always equate to the best solution. One CPU size does not fit all. For many classes of server solutions, aggregate throughput is more important than peak performance. In these applications, a many core approach with lots of LITTLE Cortex-A53 processors delivers the highest level of aggregate performance under a reduced power budget. It is likely that a range of power efficient server products will be built around Cortex-A57 or Cortex-A53, but probably not with both on the same chip. The OS software will be ready to cope with either case, big.LITTLE or homogenous multi-core, as the market evolves.

Brian Jeff, Product Manager at ARM, is based in Austin. Brian focuses on the power efficient vector along ARM’s application processor roadmap, including the Cortex-A5, the newly introduced Cortex-A7, and other CPUs further down the roadmap. He has also focused on benchmarking, performance analysis, and power analysis for ARM CPUs and systems. Brian joined ARM 3 years ago; prior to joining ARM he spent time at Texas Instruments and Freescale Semiconductor in product marketing, product management, and applications engineering roles. He has an MBA from the University of Texas at Austin and a BSEE from Virginia Tech.


x86 on ARM with Linux

Here comes the emulators! (EE Times Article) [‘ARM Servers Now’ blog from Calxeda, Oct 3, 2012]

Remember how smoothly Apple transitioned from PowerPC chips to X86 back in the mid 2000′s? Customers hardly noticed that all their software “just worked” on a completely different ISA, thanks to some cool software built by “Transitive”, a small UK based company since gobbled up by IBM. Well, emulation doesn’t solve ALL the worlds problems, and critical applications will of course need to go native for maximum performance. But this approach can be very helpful with the CAO, or Computer Aided Other; the ancillary but important applications, tools, and utilities that are so pervasive in a datacenter.

Below is an excerpt from the EE Times article, ARM Gets Weapon in Server Battle Vs. Intel.

Russian engineers are developing software to run x86 programs on ARM-based servers. If successful, the software could help lower one of the biggest barriers ARM SoC makers face getting their chips adopted as alternatives to Intel x86 processors that dominate today’s server market.

Elbrus Technologies has developed emulation software that delivers 40 percent of current x86 performance. The company believes it could reach 80 percent native x86 performance or greater by the end of 2014. Analysts and ARM execs described the code as a significant, but limited option.

A growing list of companies–including Applied Micro, Calxeda, Cavium, Marvell, Nvidia and Samsung-aim to replace Intel CPUs with ARM SoCs that pack more functions and consume less power. One of their biggest hurdles is their chips do not support the wealth of server software that runs on the x86.

The Elbrus emulation code could help lower that barrier. The team will present a paper on its work at the ARM TechCon in Santa Clara, Calif., Oct. 30-Nov. 1.

The team’s software uses 1 Mbyte of memory. “What is more exciting is the fact that the memory footprint will have weak dependence on the number of applications that are being run in emulation mode,” Anatoly Konukhov, a member of the Elbrus team, said in an e-mail exchange.

The team has developed a binary translator that acts as an emulator, and plans to create an optimization process for it.

“Currently, we are creating a binary translator which allows us to run applications,” Konukhov said. “Implementation of an optimization process will start in parallel later this year–we’re expecting both parts be ready in the end of 2014.”

Work on the software started in 2010. Last summer, Elbrus got $1.3 million in funding from the Russian investment fund Skolkovo and MCST, a veteran Russian processor and software developer. MCST also is providing developers for the [Elbrus] project. Emulation is typically used when the new architecture has higher performance than the old one, which is not the case-at least today–moving from the x86 to ARM. “By the time this software is out in 2014 you could see chips using ARM’s V8, 64-bit architecture,” Krewell noted. “That said, you will lose some of the power efficiency of ARM when doing emulation,” Krewell said. “Once you lose 20 or more percent of efficiency, you put ARM on par with an x86,” he added. Emulation “isn’t the ideal approach for all situations,” said Ian Ferguson, director for server systems and ecosystem at ARM. “For example, I expect native apps to be the main solution for Web 2.0 companies that write their own code in high level languages, but in some areas of enterprise servers and embedded computing emulation might be interesting,” he said.

Russian Chip Gurus ARM Intel Rivals With Secret Weapon [Wired, Oct 5, 2012]

Elbrus was founded in 2010 by employees of MCST — the company behind the Russian computer system also called Elbrus. In 2012, MCST and the Russian investment fund Skolkovo invested $1.3 million into the new Elbrus Technologies.

At MCST, the startup team was part of the Binary Translation Department building x86 emulators for the Russian microprocessor E2K. According to Konukhov, their emulator performed 85 percent as well as native code. They also took part in a joint project with Intel to develop an x86 translator for Intel’s Itanium chip that achieved 90 percent of native performance. Konukhov says that MCST has published 46 journal articles on binary translation, and that the company has several USA patents in the field.

Elbrus Technology’s secret sauce is its binary translator with multiple layers of hand-tuned optimization. And all the translations are handled in memory to speed up the process, with the translator itself taking up just 1MB of memory.

Although the goal is to reach 80 percent of the performance of native ARM, Knukhov says stability is more important. “Our marketing research clearly shows that most vendors and users are interested in functionality and stability rather then performance,” he says. “It is possible for us to release our solution without fully reaching performance goals and enhancing it afterwards.”

Linux 3.7 оправдал надежды ARM-разработчиков [PC Week/Russian Edition, 13.12.2012]

Российская компания “Эльбрус Технологии”, разработчик микропроцессоров, готовится решить эту проблему. Компания ведет разработку эффективного эмулятора для запуска x86-приложений на ARM-оборудовании. Данная разработка сейчас находится в стадии альфа-версии. Компания намерена к 2013 г. выпустить рабочую публичную бета-версию продукта, а к 2014 г. достичь эффективности как минимум в 80% и выпустить продукт на рынок.

На сегодня немногие компании работают на ARM-серверах, следовательно и рынок для x86-эмулятора невелик, но некоторые предприятия очень заинтересованы в экономии средств за счет перехода на ARM-серверы и именно им разработка “Эльбрус Технологии” может быть полезна, тем более, что компания, создающая x86-эмулятор для ARM, имеет опыт работы по бинарной трансляции кода, а новая ARM-среда создается вручную, чтобы максимально учесть особенности новых систем.

https://twitter.com/eltechs/status/275192193982009345
Elbrus TechnologiesElbrus Technologies@eltechs

Skolkovo have chosen Eltechs as one of the Success Story in scope of October’s 2012 report: http://community.sk.ru/press/our_results/p/oktober_2012.aspx …

2:58 AM – 2 Dec 12

x86 running on ARM! [Low Power Servers [.com], Oct 16, 2012]

Today marked an important milestone in our product testing and development for our Viridis platform here at Boston. We can now officially confirm that we have run x86 binaries our on ARM based Viridis platform!

Over the last few weeks, we have been working with a group of engineers, from Eltechs, who are developing software to run x86 programs on ARM-based servers. This software could help lower one of the largest barriers to ARM SoC adoption as alternatives to Intel x86 processors in the datacentre.

Eltechs has developed a binary translator that acts as an emulator. The software currently delivers on average around 45% of native ARM performance. During our tests on the Viridis platform we observed up to 65% of native performance (6 tests were run covering a range of tests – details cannot be published at this time). We will be working with Eltechs on our Viridis platform, who believes it could reach 80% native ARM performance or greater in the future.

Of all the ARM products tested by Eltechs, we were delighted to hear our platform was received well:

The Boston server has been the fastest platform we have tested to date, Vadim Gimpelson, CEO of Eltech

We will continue to work with Eltechs in testing and validating our platform and hope to see further improvements as the software matures. In addition to our successful initial tests, we will be adding this software to the Boston ARM Wrestle program so if anyone has a particular code or application that hasn’t been ported to ARM, please get in touch with us at hpc@boston.co.uk to discuss benchmarking on our test cluster.

Boston Viridis ARM Server Gets x86 Binary Translation Support [AnandTech, Oct 18, 2012]

We covered the launch of the Calxeda-based Boston Viridis ARM server back in July. The server is makings its appearance at the UK IP EXPO 2012. Boston has been blogging about their work on the Viridis over the last few months, and one of the most interesting aspects is the fact that x86 binary translation now works on the Viridis. The technology is from Eltech, and they have apparently given the seal of approval to the Calxeda platform by indicating that the Boston Viridis was the fastest platform they had tested.

Eltech seems to be doing dynamic binary translation, i.e, x86 binaries are translated on the fly. That makes the code a bit bulky (heavier on the I-Cache). The overhead is relatively large compared to, say, VMware’s binary translator (BT) that does x86 to x86, becauseof the necessity to translate between two different ISAs.

Eltech uses a 1 MB translator cache (similar to the translator cache of VMware’s BT), which means they can reuse earlier translations. The translation overhead will thus decrease quickly over time if most of the critical loops fit in the translator cache. But it also means that only code with a relatively small footprint will run fast, e.g. get the promised 40-65% of native performance.

Most server applications have a relatively large instruction memory footprint, so it is unclear whether this approach will help to run any heavy server software. Some HPC softwares have a small memory footprint, but since the HPC users tend to pursue performance most of the time, this technology is unlikely to convince them to use ARM servers instead of x86.

In general, the BT software will be useful in the – not uncommon – case where one may have a complex web application comprised of multiple software modules where one small piece of software is not open-source and the vendor does not offer an ARM based binary. So, the Eltech solution does handle a small piece of the puzzle. x86 emulation is thus a nice to have feature, but most ARM based servers will be running fully optimized and recompiled linux software.  That is the target market for products such as the Boston Viridis.

IP Expo: Boston Brings World’s First ARM Server To The UK [TechWeekEurope, Oct 18, 2012]

Low-power ARM-based Viridis servers manufactured by Boston Limited have made their UK debut at the IP Expo 2012 in London.

Boston is the world’s first company to make servers based on ARM processor technology, commonly used in smartphones and tablets.

The Viridis is the first system to approach the much talked about concept of Hyperscale, involving very high density systems that are only possible with low heat, low power chips.

The flying ARM server pig

Boston Viridis is based on the Calxeda EnergyCore System-on-a-Chip (SoC) which provides “supercomputer performance” while delivering a 90 percent reduction in energy costs when compared with conventional servers. Since every SoC consumes as little as 5 Watts of power, the system needs little active cooling, lowering maintenance costs even further.

imageProvisioned within a 2U enclosure, each Viridis unit contains up to 12 quad-node Calxeda EnergyCards with built-in Layer-2 networking. The EnergyCard is a single PCB module containing four EnergyCore SoCs, each with 4GB DDR-3 registered ECC memory, four SATA connectors and management interfaces.

Providing up to 192 cores and 48 nodes per enclosure, this highly dense solution can put up to 900 servers into a single industry standard 42U rack.

“These building blocks of high end computing are set to radically change the economics of large scale data centres, sparking innovation in emerging fields such as cloud computing, data modelling and analysis – often called ‘Big Data’ – scientific research and media streaming,” said David Power, head of HPC at Boston.

In the Viridis, Ethernet switching is handled internally by 80Gb bandwidth on the EnergyCore fabric switch, thereby negating the need for additional switches that consume unnecessary power and add unwanted latency.

The servers are supported by Ubuntu Server 12.04 LTS and Fedora v17+ distributions. They have been shown to run cloud management software from Openstack, Big Data tools Hadoop and Cassandra, applications built in Java, Ruby on Rails and Python.

Earlier this month, Boston and Russian software developers Eltech had managed to run x86 binaries on the Viridis platform, proving that in the future ARM servers could pose a serious threat to the Intel silicon in the data centre.

Boston claims that with specific applications, one 2U Viridis appliance can outperform a whole rack of standard x86 servers, yet at the same time consume one tenth of the power and occupy one tenth of the space.

Russian startup working on Intel to ARM software emulator [ITworld, Oct 9, 2012]
[Russian version: Разработано средство миграции ПО с x86 на ARM]

Elbrus Technologies in Moscow is developing an x86 to ARM binary translator for use on ARM-powered servers

A Russian startup company called Elbrus Technologies is developing a technology that will allow data center owners to migrate software designed for x86 platforms to ARM-powered servers without the need to recompile it.

Because of their very low power consumption, ARM processors are used today in most smartphones, tablets and in a wide variety of embedded devices.

However, ARM chips are also expected to gain a foothold in the server market, which is currently dominated by x86 processors, during the next few years. Hewlett-Packard and Dell have already announced plans to build low-power servers based on ARM CPUs.

Intel CPUs use up to ten times more power than ARM CPUs and for large data centers power consumption represents 50 percent of their operational costs, said Anatoly Konukhov, the chief business development officer of Elbrus Technologies in Moscow.

In this context it makes sense for many data center operators to consider switching to ARM-based servers in the future. However, a big impediment is that many applications — specially proprietary, closed-source, ones — that are designed for the x86 CPUs won’t work on ARM processors.

Elbrus Technologies is trying to solve this issue by building an x86 to ARM binary translator application that will allow proprietary software compiled for the x86 architecture to run on ARM-powered servers without any changes.

The software emulation will be transparent to the user, Konukhov said. The emulator will automatically detect when an x86 application is executed and will perform the binary translation, he said.

Even though the technology is theoretically platform-independent, the company currently focuses its development efforts on supporting Linux servers and software. Support for Windows software is a longer term goal.

The project started in the spring of 2012 and the product is expected to be ready for beta testing in the middle of next year, Konukhov said. The final product will be released sometime at the end of 2013 or in the beginning of 2014, he said.

“I think we currently support 50 or 60 percent of the functionality of Intel-based CPUs,” Konukhov said. This includes the entire base instruction set of the x86 architecture.

The company is working on adding support for the Streaming SIMD Extensions (SSE) and MMX instruction sets. “This will basically allow us to have multimedia functionality in our applications,” Konukhov said.

The performance of translated code compared to native code is currently at 45 percent. The goal is to have a performance level of 80 percent or more, but that probably won’t be the case for the first production ready version of the product.

“We think it will be lower and there’s a good reason for that,” Konukhov said. “We’ve discussed this issue with our partners and they were more interested in the functionality supported by our emulator and in stability rather than performance. So, they would like to see working and stable software rather than fast software.”

The performance enhancing work will begin after the initial product is released and a 80 or 90 performance level is expected to be achieved in a matter of months, Konukhov said.

The company worked with partners and potential customers to determine which applications should be considered a priority for its x86 to ARM binary translation technology. Konukhov declined to name any of those applications because of existent non-disclosure agreements, but said that they are from the financial and healthcare sectors.

A lot of the people working on this project came from MCST, Elbrus Technologies’ parent company, where they worked on developing x86 to Elbrus binary translators, Konukhov said. Elbrus is a Russian microprocessor manufactured by MCST.

Elbrus Technologies raised US$1.3 million in funding from MCST and the Skolkovo Foundation, a non-profit organization tasked by the Russian government to manage grant funds for technology projects. Elbrus is looking for additional investors and business partners, Konukhov said.

 


Boston Ltd. related information from Calxeda

Calxeda EnergyCore-Based Servers Now Available [‘ARM Servers Now’ blog from Calxeda, July 9, 2012]

We spent a lot of time at various tradeshows around the world in June and the #1 question we were asked was “when can I get my hands on a Calxeda-based server?” I am happy to tell you the wait is over.

We have been working with Boston Limited in the UK, a highly respected  solution provider, for about a year to bring an excellent Proof of Concept (POC) platform to market called “Viridis”Boston currently has about 20 customers lined up for beta testing and a pipeline of hundreds of others interested in evaluating the platform.  Boston is taking orders now from users in Europe, Asia and the US with shipments beginning later this month.

The Register published a great article today highlighting the features of the Boston Viridis platform:

http://www.theregister.co.uk/2012/07/09/boston_viridis_arm_server/

Boston Viridis is a perfect option for those users who want to port their code, run benchmarks, and optimize their workloads for ARM.  This highly configurable solution allows users to create their ideal initial testing environments with options ranging from 4 to 48 Calxeda EnergyCore server nodes in a 2U form factor.

We look forward to working with Boston and other systems providers to enable the market with Calxeda-based POCs.  Stay tuned as we learn about success stories users experience with Calxeda EnergyCore-based solutions over the coming months.

The World’s First 130 Watt Server Cluster [‘ARM Servers Now’ blog from Calxeda, Oct 25, 2012]

Calxeda’s approach to driving power optimization in the datacenter goes well beyond the processor.  We focus on enabling our partners to achieve rack level power efficiency based on our technology. Last week, Boston Limited announced their 2U Viridis platform with 24 Calxeda EnergyCore(TM) server nodes, 96GB of memory, and 6TB storage is measuring 130W “at the wall”. This equates to just 5.5W of power per server inclusive of memory, disk and chassis-level overhead. At a fraction of the power of a traditional x86 server node, the Viridis server cluster based on Calxeda EnergyCore will allow datacenter operators to experience an order of magnitude improvement in efficiency.  Said another way,  this platform can power 24 quad-core servers, with 24 SSDs and 96 GB DRAM for about the same or less power consumption as a single low-end two-socket x86 server. So long as the 24 servers can get more work done than the single x86 server for the targeted workload like web serving,  it will substantially reduce datacenter power.

If you would like to see these power efficiency enhancements in person, come see the Boston Viridis featured at ARM TechCon 2012 in Santa Clara next week in both the ARM and Canonical booths.

Here is a video of David Borland, Calxeda Co-Founder and VP of Hardware, discussing the Boston Viridis power enhancements and the innovative chassis-level optimizations that our engineering teams worked together to achieve.

The Boston Viridis system optimizes Calxeda EnergyCore technology to achieve unprecedented power performance.

Happy Birthday, EnergyCore! [‘ARM Servers Now’ blog from Calxeda, Nov 5, 2012]

One year of EnergyCore technology

Calxeda introduced its patented EnergyCore technology to the marketplace one year ago last week. In the year since, we have continued to work hard with our ISV and OEM partners to expand the ARM server ecosystem and bring systems to market, and we are pleased with the progress that’s been made.

Five companies now provide EnergyCore-based systems: HP, Boston Ltd., Dell, Penguin Computing, and System Fabric Works. We work closely with our partners to optimize EnergyCore technology for each specific application: we recently detailed how we worked with Boston Ltd. to power-optimize the Viridis system, creating the world’s first 130 W server cluster (24 EnergyCore nodes with 96 GB of memory and 6 TB of storage)–that’s just 5.5 W per complete server. Benchmarks including recent releases from Phoronix have demonstrated that Calxeda systems achieve the promised performance levels, resulting in significant potential TCO savings over incumbent x86 solutions.

In the last year, we also have been pleased to collaborate with our partners to support industry initiatives that advance the adoption of ARM server technology, including OpenStack’s TryStack ARM Zone and the Apache Software Foundation (ASF). These programs are important to the open source community and will help further the adoption of ARM servers.

We are honored to be recognized for our efforts: Calxeda was named one of the Wall Street Journal’s Top 10 Venture Green Companies and listed as one of Business Insider’s 10 Most Disruptive Enterprise Tech Companies. Calxeda was an EETimes/EDN ACE Awards finalist this year, and CEO Barry Evans was nominated as E&Y Entrepreneur of the Year.

And to help us pay bills and invest in the future, we recently closed $55M in additional funding with the continued support of our existing investors plus the addition of Austin Ventures and Vulcan Capital. We are looking to the future and recently described our plans for the next generation of our innovative technology using ARM’s Cortex-A50 series 64-bit cores, which we announced at ARM TechCon last week.

All in all, it’s been a great year, and the momentum continues to grow. Happy Birthday, EnergyCore! The ARM revolution definitely has begun.

Calxeda Lays Out a Vision for the Hyper-Efficient Datacenter [Calxeda press release, Oct 17, 2012]

Plans Include New Platforms for Cloud and Warehouse-Scale Datacenters

Calxeda, the company that first invented the concept of using ARM® technology to slash datacenter power, today announced its vision and roadmap to extend the company’s leadership in the hyper-efficient computing market.  Following the recent announcement of $55 million in additional capital, this news outlines Calxeda’s plans to catalyse rapid market adoption and the creation of an entirely new category of IT products.

The Calxeda EnergyCore ECX-1000, now available, has been called one of the most disruptive technologies in the IT industry today*.  The company has now shipped thousands of early EnergyCore SoCs to OEM customers and end users, and is providing free access to the technology on the OpenStack Trystack.org cloud. The product is now available in servers from Penguin Computing, who announced its partnership with Calxeda today, in addition to long-time partners Boston Limited and Hewlett-Packard.

The Calxeda roadmap implements a two-pronged strategy to reach additional markets.   The first enables optimized racks for public and private clouds, while the second will enable and span massive warehouse-scale datacenters

“We are very excited about the market’s response to our pioneering first generation product,” said Barry Evans, Calxeda’s founder and CEO.  “Now we are taking it two steps further to reinvent the server, first into a rack-based cloud appliance, and then extending into an integrated fleet of computing resources, spanning many thousands of efficient servers.”

Calxeda’s second-generation platform, code-named “Midway,” opens new markets for Calxeda. “It’s all about finding the right balance of I/O, Storage, networking, management, memory and computational elements for each target market segment,” added Evans. “This is the beauty of an ARM-based SoC approach: each platform can be tailored to add more value by addressing the unique needs of a specific workload.”  

To go after cloud applications such as dynamic web hosting and more computationally intensive Big Data analytics, Midway delivers more performance, more memory and hardware virtualization support using standard CortexA15 ARM cores.  In addition, Calxeda’s second generation fabric will support new features such as dynamic power and routing optimization for public and private clouds.  Midway will be available in volume in 2013.

“64-bit ARM architecture-based production servers are years away,” said Patrick Moorhead, president and principal analyst, Moor Insights & Strategy. “Calxeda’s approach to shipping 32-bit technology today and upgrading to the ARM A15 in 2013 makes a lot of sense for specialized workloads in the largest datacenters.”  

Calxeda’s third generation platform, code-named “Lago,” is Calxeda’s platform for the warehouse-scale datacenter.  Built on the 64-bit ARM V8 architecture, Lago features Calxeda’s third generation scaling features, called the Calxeda Fleet Services™, to further automate and optimize common operations at massive scale.  The enhanced fabric will also connect hundreds of thousands of nodes, with quality of service features and the ability to allocate and control resources. 

“We expect to lead the industry with new concepts that will change the datacenter in ways far beyond just lowering power and increasing density,” continued Evans. “Lago will be in the first wave of 64-bit complete systems and application stacks on ARM in 2014, and we are collaborating with key partners to ensure that customers can ramp quickly with production-quality software and OS support for both Midway and Lago.”

Calxeda Trailblazer partners continue to be critical in collaborating to  develop the required ecosystem.  The Trailblazer initiative provides early access to Calxeda technology for collaborative development and innovation with Calxeda’s engineers and architects.  Canonical has been a Trailblazer partner since the program’s inception and shared this:

“Canonical believes that ARM-based servers deliver significant efficiency savings for enterprises. As part of our long term collaboration, we’ve delivered Ubuntu 12.04 LTS on Calxeda hardware,” said Steve George, vice president, Canonical.  ”Today, we welcome the Calxeda team’s extended roadmap and look forward to continuing our partnership with Calxeda as we bring the benefits of power efficient hyperscale computing to datacenters.”

About Calxeda

Founded in January 2008, Calxeda brings new performance density to the datacenter with revolutionary server-on-a-chip technology.  Calxeda currently employs 100 professionals in Austin Texas and the Silicon Valley area.  Calxeda is funded by a unique syndicate comprising industry leading venture capital firms and semiconductor innovators, including ARM Holdings, Advanced Technology Investment Company, Austin Ventures, Battery Ventures, Flybridge Capital Partners, Highland Capital Partners, and Vulcan Capital. See www.calxeda.com for more information.

*http://www.businessinsider.com/10-disruptive-enterprise-tech-companies-2012-9?op=1


Background on Elbrus (in Russian or English if available)

ОСНОВНЫЕ ПРИНЦИПЫ АРХИТЕКТУРЫ E2K [МЦСТ, 31 июля 2001 г.]

Здесь представлена статья Б. Бабаяна “Main Principles of E2K Architecture” в варианте, опубликованном в журнале “Free Software Magazine”, Китай (Vol.1, Issue 02, Feb 2002 17).На сайте сохранена оригинальная нумерация страниц журнала. С нашего сайта Вы можете загрузить перевод оригинала статьи в формате PDF.

Elbrus (computer) [Wikipedia, Aug 13, 2012]

The Elbrus (Russian: Эльбрус) is a line of Soviet and Russian computer systems developed by Lebedev Institute of Precision Mechanics and Computer Engineering. In 1992 a spin-off company Moscow Center of SPARC Technologies (MCST) was created and continued development.

These computers are used in the space program, nuclear weapons research, and defense systems.

MCST develops microprocessors based on 2 different instruction set architecture(ISA): Elbrus and SPARC

  • Elbrus 1 (1973) was the fourth generation Soviet computer, developed by Vsevolod Burtsev. Implements tag-based architecture and ALGOL as system language like the Burroughs large systems. A side development was an update of the 1965 BESM-6 as Elbrus-1K2.
  • Elbrus 2 (1977) was a 10-processor computer, considered the first Soviet supercomputer, with superscalar RISC processors. Re-implementation of the Elbrus 1 architecture with faster ECL chips.
  • Elbrus 3 (1986) was a 16-processor computer developed by Boris Babaian. Differing completely from the architecture of both Elbrus 1 and Elbrus 2, it employed a VLIW architecture.
  • Elbrus-90micro (1998-2010) is a computer line based on SPARC instruction set architecture (ISA) microprocessors: MCST R80, R150, R500, R500S and MCST-4R working at 80, 150, 500 and 1000 MHz.
  • Elbrus-3M1 (2005) is a 2-processor computer based on Elbrus 2000 microprocessor employing VLIW architecture working at 300 MHz. It is a further development of the Elbrus 3 (1986).
  • Elbrus МВ3S1/C (2009) is a ccNUMA 4-processor computer based on Elbrus-S microprocessor working at 500 MHz.

Elbrus 2000 [Wikipedia, Dec 9, 2012]

imageThe Elbrus 2000, E2K (Russian: Эльбрус 2000) is a Russian 512-bit wide VLIW microprocessor developed by Moscow Center of SPARC Technologies (MCST) and fabricated by TSMC.

It supports 2 instruction set architecture (ISA):

Thanks to its unique architecture Elbrus 2000 can execute up to 23 instructions per clock so even with its modest clock speed can compete with much faster clocked superscalar microprocessors especially when running in native VLIW mode.

Supported operating systems

Elbrus 2000 Highlights

produced

2005

process

CMOS 0.13 µm

clock rate

300 MHz

peak performance

64 Bit: 5.8 GIPS

 

32 Bit: 9.5 GIPS

 

16 Bit: 12.3 GIPS

 

8 Bit: 22.6 GIPS

data format

integer: 32, 64

 

float: 32, 64, 80

cache

64 KB L1 instruction cache

 

64 KB L1 data cache

 

256 KB L2 cache

data transfer rate

to cache: 9.6 GByte/s

 

to main memory: 4.8 GByte/s

transistors

75.8 million

connection layers

8

packing / pins

HFCBGA / 900

chip size

31×31×2.5 mm

voltage

1.05 / 3.3 V

power consumption

6 W

External links

Эльбрус-S [Википедия, 30 апреля 2012]

imageЭльбрус-S (1891ВМ5Я) — российский микропроцессор с архитектурой VLIW(EPIC), разработанный компанией МЦСТ. Является следующим поколением микропроцессора Эльбрус 2000.

Процессор Эльбрус-S основан на архитектуре ELBRUS (англ. ExpLicit Basic Resources Utilization Scheduling — «явное планирование использования основных ресурсов»), отличительной чертой которой является наиболее глубокое на сегодняшний день распараллеливание ресурсов для одновременно исполняющихся VLIW-инструкций. Пиковая производительность 39,5 GIPS.

Основные характеристики микропроцессора «Эльбрус-S»[1]

Технологический процесс

КМОП 0,09 мкм

Рабочая тактовая частота

500 МГц

Пиковая производительность

64 бита — 4,0 GFLOPS

 

32 бита — 8,0 GFLOPS

Разрядность данных

целые — 8, 16, 32, 64

 

вещественные — 32, 64, 80

Кеш-память

команд 1-го уровня — 64 Кбайт

 

данных 1-го уровня — 64 Кбайт

 

2-го уровня (универсальная) — 2 Mбайт

Кеш-таблица страниц

данных — 1024 входов

 

команд — 64 входов

Пропускная способность

шин связи с кеш-памятью — 16 Гбайт/с

 

шин связи с оперативной памятью — 8 Гбайт/с

 

шин связи межпроцессорного обмена — 12 Гбайт/с

Площадь кристалла

142 мм²[2]

Количество транзисторов

218 млн.

Количество слоев металла

9[3]

Тип корпуса / количество выводов

HFCBGA / 1156

Размеры корпуса

35×35×3,2 мм

Напряжение питания

1,1 / 1,8 / 2,5 В

Рассеиваемая мощность

13-20 Вт[4]

Модули

Процессор является основой 4-х процессорного вычислительного модуля МВ3S/C.[5] Формат модуля CompactPCI 6U. Модуль содержит 8Гб ОЗУ.[6]

Вместе с процессором используется микросхема КПИ (контроллера периферийных интерфейсов), испытания которой завершились одновременно с испытаниями процессора.[5]

Процессоры и модуль на их основе были представлены в октябре 2010 года на выставках “ChipEXPO-2010” и Softool[7]

Микропроцессорные вычислительные комплексы с архитектурой «Эльбрус» и их развитие [Nov 21, 2008]

А.К. Ким, Генеральный директор ОАО «ИНЭУМ им. И.С. Брука»
В.Ю. Волконский, нач. отделения ОАО «ИНЭУМ им. И.С. Брука»
Ю.Х.Сахин, нач. отделения ОАО «ИНЭУМ им. И.С.Брука»
С.В.Семенихин, нач. отделения ОАО «ИНЭУМ им. И.С.Брука»
В.М.Фельдман, нач. отделения ОАО «ИНЭУМ им. И.С.Брука»
Ф.А. Груздов, нач. отдела ОАО «ИНЭУМ им.И.С. Брука»,
Ю.Н.Парахин, нач .отдела ОАО «ИНЭУМ им.И.С. Брука»,
М.С. Михайлов, нач. отдела ОАО «ИНЭУМ им.И.С. Брука»,
М.В. Слесарев, научный сотрудник ОАО «ИНЭУМ им.И.С. Брука»,

Рассматриваются архитектурные особенности, принципы построения и технические характеристики российских вычислительных комплексов серии «Эльбрус». Для повышения производительности используется явный параллелизм операций, векторный параллелизм операций, параллелизм потоков управления, параллелизм задач. В структуре российских микропроцессоров этой серии используется многоядерный параллелизм систем на кристалле. Явный параллелизм операций в сочетании со специальной аппаратной поддержкой применяется для обеспечения эффективной совместимости с архитектурной платформой Intel x86 (IA-32) на базе невидимой пользователю системы динамической двоичной трансляции. Наконец, параллелизм используется в аппаратуре для поддержки защищенной реализации любых языков программирования, в том числе C и C++. Все эти особенности позволяют создавать универсальные вычислительные комплексы повышенной надежности и широкого диапазона применения, начиная от настольных компьютеров и встраиваемых ЭВМ и заканчивая мощными серверами и суперкомпьютерами.

2. Реализация архитектуры микропроцессора «Эльбрус» и вычислительного комплекса «Эльбрус-3М1»

Определяющая стадия работы над реализацией оригинальной российской архитектуры завершилась в ноябре 2007 года успешными государственными испытаниями микропроцессора «Эльбрус» и двухпроцессорного вычислительного комплекса «Эльбрус-3М1» на его основе. ВК «Эльбрус-3М1» работал под управлением перенесенной на него операционной системы Linux, а также ОС МСВС. В ходе испытаний была показана возможность эффективного исполнения на ВК «Эльбрус-3М1» программных систем заказчика, разработанных различными организациями. При исполнении этих задач на ВК «Эльбрус-3М1» с частотой 300 МГц было получено ускорение в среднем в 1.44 раза относительно Pentium 4 с частотой 1,4 ГГц.

2.2. Двоичная совместимость с архитектурой IA-32

Пользователю ВК «Эльбрус-3М1» предоставляется средства полной двоичной совместимости с архитектурой IA-32. Это достигается за счет аппаратной поддержки семантики операций архитектуры IA-32, а также средств поддержки программно-аппаратной реализации совместимости с использованием технологии скрытой (невидимой пользователю) динамической двоичной трансляции [8-9].

Система двоичной трансляции (Двоичный транслятор) предназначена для высокоэффективного исполнения двоичных кодов, реализованных для архитектуры IA-32 или аппаратно совместимых с ней (исходная платформа) на вычислительном комплексе ВК «Эльбрус-3М1» (целевая платформа). Двоичный транслятор реализует семантическую совместимость с исходной платформой на уровне виртуальной машины, позволяет исполнять на ВК «Эльбрус-3М1» произвольные коды исходной платформы, включая коды произвольной операционные системы.

Двоичная трансляция является высокопроизводительным и надежным средством обеспечения переносимости двоичных кодов между вычислительными машинами различных архитектур [10-11]. Опыт создания двоично-транслирующей системы для ВК «Эльбрус-3М1» экспериментально подтверждает возможность достижения двоично-транслированными кодами эффективности исполнения, существенно превосходящей показатели исходной архитектуры для аналогичной тактовой частоты.

Современные микропроцессоры, использующие суперскалярную архитектуру, например, микропроцессоры платформы IA-32, сначала аппаратно декодируют сложные команды переменной длины и преобразуют их в более простые и регулярные микрооперации. Далее выполняется переименование регистров, чтобы исключить ложные зависимости между микрооперациями, обусловленные ограниченным количеством регистров в исходной системе команд. При этом выполняются некоторые оптимизации, в частности, из командного потока исключаются операции чтения из памяти, если в этом потоке им предшествуют записи по тому же адресу. Затем для некоторых реализаций формируется трасса перекодированных микроопераций, которая представляет собой наиболее вероятную цепочку операций не с одного, а с нескольких следующих один за другим линейных участков исполнения кода. Эта трасса помещается в специальную скрытую память (кэш трасс) для повторного использования. Чтобы обеспечить наиболее оптимальный набор трасс, аппаратно поддерживается специальная обучающая система, которая наблюдает за выполнением операций передачи управления в программе и стремится предсказать направление перехода в каждой точке. Наконец, аппаратура выполняет планирование выполнения микроопераций на заданном парке имеющихся исполняющих устройств.

При программно-аппаратной реализации совместимости с использованием техники двоичной трансляции большая часть действий по перекодировке, анализу зависимостей, набору региона планирования, назначению регистров и планированию операций исключается из аппаратуры и передается двоичному транслятору. Суть техники двоичной трансляции сводится к декомпозиции последовательностей двоичных кодов исходной архитектуры и преобразованию их в функционально эквивалентные последовательности кодов целевой архитектуры, впоследствии исполняемые на аппаратуре целевой платформы. При этом, в отличие от такого распространенного метода обеспечения двоичной совместимости, как покомандная интерпретация, двоичная трансляция способна достигать достаточно высокой степени эффективности ”исполнения” исходных кодов за счет оптимизации, сохранения и возможности многократного исполнения единожды оттранслированных целевых кодов.

Двоичный транслятор для ВК «Эльбрус-3М1» представляет собой динамический двоичный транслятор уровня виртуальной машины, что позволяет исполнять на ВК полную номенклатуру реализованных для исходной платформы операционных систем с соответствующими наборами приложений. Таким образом, основным достоинством этого режима работы становится высокая универсальность, обеспечивающая возможность исполнения на ВК «Эльбрус-3М» любого программного обеспечения (включая драйверы периферийных устройств), доступного пользователям вычислительных машин исходной архитектуры.

Эффективность системы двоичной трансляции ВК «Эльбрус-3М1» определяется наличием существенно большего числа устройств исполнения операций по сравнению с суперскалярными архитектурами (по крайней мере, в 2 раза), что является прямым следствием исключения из аппаратуры логики распараллеливания операций и передачи этих функций двоичному транслятору. Программные алгоритмы оптимизации обеспечивают просмотр значительно более крупных регионов кодов по сравнению с «окном» распараллеливания операций в суперскалярных архитектурах и позволяют задействовать всю номенклатуру исполняющих устройств. За счет этого на ВК «Эльбрус-3М1» удается достигать более высокой логической скорости (время выполнения при одинаковых тактовых частотах) при выполнении программ в кодах IA-32, что было продемонстрировано при проведении Государственных испытаний. Так, например, ни на одной из 10 задач пакета SPECfp95 производительность ВК «Эльбрус-3М1» (300 МГц) не опускается ниже производительности Pentium II (300 МГц), а, в среднем, превосходит его в 1,75 раза. При этом средняя производительность ВК «Эльбрус-3М1» даже превышает в 1,17 раза Pentium III (450 МГц). На более широком классе задач производительность ВК «Эльбрус-3М1» при исполнении кодов IA-32 сравнима с производительностью процессоров типа Pentium II, Pentium III и Pentium 4, работающих в диапазоне частот 300-1500 МГц.

Система двоичной трансляции обладает высокой надежностью. Она обеспечила успешное исполнение на ВК «Эльбрус-3М1» более 20 операционных систем в кодах IA-32, в том числе MS DOS, несколько версий Windows (95, NT, 2000, XP и др.), Linux, FreeBSD, QNX. Под управлением этих операционных систем успешно и эффективно работают свыше 1000 популярных приложений, в том числе интерактивные компьютерные игры, программы из состава пакета MS Office (MS Word, MS Excel, MS PowerPoint и др.), видео ролики, программы компрессии декомпрессии данных, драйверы всех внешних устройств.

2.4. От большой машины к микропроцессору

Архитектурная линия микропроцессора «Эльбрус» берет свое начало от многопроцессорного вычислительного комплекса (МВК) «Эльбрус-3», который создавался в Советском Союзе в конце 80-х годов как продолжение линии вычислительных комплексов «Эльбрус-1» и «Эльбрус-2» [14]. Это была большая машина, которая разрабатывалась с использованием больших интегральных схем советского производства. В 16-процессорном комплексе каждый процессор представлял собой отдельный шкаф. Но в архитектуре центрального процессора были заложены многие черты, которые затем нашли свое воплощение в микропроцессоре с архитектурой «Эльбрус».

Процессор управлялся широкой командой, позволяя получать до 7 результатов арифметико-логических операций, а также считывать из памяти до 6 и записывать до 2 64-разрядных данных за один машинный такт. В архитектуру были заложены спекулятивные и предикатные операции. Аппаратная поддержка циклов включала вращающиеся базированные регистры и устройство предварительной подкачки данных из памяти в эти регистры с автоматическим продвижением адресов. Для распараллеливания управления использовалась техника подготовки переходов. Поскольку «Эльбрус-3» продолжал архитектурную линию «Эльбрус-1» и «Эльбрус-2», в него была заложена совместимость на уровне операций с этими ВК, включая поддержку защищенного программирования на базе аппаратных тегов.

Первые процессоры «Эльбрус-3» были изготовлены в 1991 г. и началась их наладка. Но начавшиеся в 1992 г. экономические изменения в стране привели к остановке проекта и к переосмыслению путей дальнейшего развития российской вычислительной техники. Стало ясно, что вычислительная техника может успешно развиваться только на базе микропроцессоров. Вопросы совместимости с одной из распространенных в мире микропроцессорных архитектурных платформ стали важным требованием времени. Все эти изменения, в конце концов, привели к трансформации проекта «Эльбрус-3» в проект ВК «Эльбрус-3М1», основанным на микропроцессорной архитектуре «Эльбрус» с явным параллелизмом команд, с поддержкой защищенной реализации языков программирования и с полной совместимостью с платформой IA-32 на основе технологии двоичной трансляции.

Российской компании ЗАО «МЦСТ», которая с 2007 г. интегрируется с ОАО «ИНЭУМ им. И.С.Брука» в отраслевой институт с целью ускорения работ по созданию новых поколений ВК серии «Эльбрус». Программа развития рассчитана более чем на 10-летний срок и охватывает совершенствование микропроцессоров, вычислительных комплексов на их основе, включая микропроцессорный набор и конструктивные элементы, а также системное программное обеспечение, в том числе операционные системы, компиляторы, технологию двоичной трансляции высокопроизводительные библиотеки.

СИСТЕМА ДИНАМИЧЕСКОЙ ДВОИЧНОЙ ТРАНСЛЯЦИИ X86 → «ЭЛЬБРУС»
[Н.В. Воронов, В.Д. Гимпельсон, М.В. Маслов, А.А. Рыбаков, Н.С. Сюсюкалов (ЗАО «МЦСТ»), Oct 31, 2011]
DYNAMIC BINARY TRANSLATION SYSTEM X86 → «ELBRUS»
[Nikita Voronov, Vadim Gimpelson, Maxim Maslov, Aleksey Rybakov, Nikita Syusyukalov (MCST), Oct 31, 2011]

Дается описание системы динамической трансляции двоичных кодов архитектуры x86 в архитектуру «Эльбрус». Рассматривается общая схема работы двоичного транслятора, многоуровневая система оптимизаций, технологии сокращения накладных расходов на трансляцию (долговременное хранение кодов и параллельная трансляция). Приводится сравнение производительности с несколькими x86 микропроцессорами.

Ключевые слова: двоичная трансляция, виртуальная машина, микропроцессор Эльбрус.

The article describes dynamic binary translation system developed for translation of x86 binary codes to Elbrus architecture. We consider general principles of binary translation, describe our multi-level optimization engine and translation overhead decreasing techniques (long-time translation storage and parallel translation). Finally we investigate performance of Elbrus processor running binary translation system and compare it against several x86 microprocessors.

Keywords: binary translation, virtual machine, co-desing virtual machine, Elbrus microprocessor.

4. Экспериментальные результаты

В заключение приведём результаты сравнения производительности системы полной двоичной трансляции, работающей на микропроцессоре «Эльбрус-S» (частота 500 МГц) с двумя x86-микропроцессорами: Pentium-M (частота 1000 МГц) и Atom D510 (частота 1660 МГц). Сравнение проводилось на пакете тестов SPEC 2000. На рис. 8 и 9 приведены результаты целочисленных и вещественных задач, соответственно.

clip_image002

Рис. 8 Результаты сравнения производительности
на пакете SPEC 2000 Int

clip_image004

Рис. 9 Результаты сравнения производительности
на пакете SPEC 2000 FP

Данные для микропроцессора Pentium-M были взяты с официального сайта SPEC. Результаты на микропроцессорах Atom и «Эльбрус» получены авторами, при этом для обоих измерений брались одинаковые коды. Система двоичной трансляции x86 → «Эльбрус» работала со всеми описанными в данной статье технологиями и была собрана оптимизирующим языковым компилятором с высоким уровнем оптимизаций.

Технология двоичной трансляции [iXBT.com, Nov 3, 2009]

Сущность, сферы применения и особенности реализации

  1. Введение
    1. Классификация систем ДТ по типу (FBTS и ABTS)
    2. Классификация систем ДТ по выполняемой задаче
      1. Межплатформенная совместимость
      2. Виртуализация
      3. Внутриплатформенная динамическая оптимизация
      4. Инструментирование кода
      5. Содействие проникновению на рынок новых архитектур
    3. Взаимодействие ДТ с другими областями Computer Science
    4. Ключевые концепции ДТ
    5. Анализ осуществлённых проектов
    6. Динамическая и статическая ДТ
      1. Динамический подход
      2. Статический подход
  2. Список литературы
  3. Приложение

Двоичная трансляция (ДТ) — технология с достаточно длинной на данный момент историей, отсутствием каких-либо официальных документов, подробно описывающих достижения в этой области, и непредсказуемым будущим. Несмотря на то, что уже был реализован ряд систем двоичной трансляции и проведена серия исследований в этой области в различных научных центрах, до сих пор никто не использует такие системы в повседневной работе. Это и по сей день является многообещающей технологией и притягательным для многих инженеров направлением исследований. Уже давно витает в воздухе вопрос, где же реальные реализации в области двоичной трансляции, имеющие возможность стать всемирнопризнанными коммерческими продуктами?

Далее я планирую рассмотреть предпосылки возникновения двоичной трансляции и причины, по которым некоторые, наиболее известные продукты не смогли достичь коммерческого успеха, и отдельно сфокусировать внимание на двух, взаимодополняющих друг друга подходах — динамической и статической ДТ.


Background on Elbrus Technologies (in Russian)

Сайт Эльбрус технологии, arm процессоры, двоичный компилятор, x86 to arm [June 9, 2012]

Компания

Эльбрус Технологии – молодой и энергичный стартап, фокусирующийся на высокотехнологичных программных проектах. Наша цель – создавать продукты превосходного технического качества, способные повлиять на развитие индустрии ИТ в целом.

Рынок

Наш рынок – облачные сервисы, дата-центры и кластеры, построенные на новейших серверах с архитектурой ARM. Компании Hewlett Packard и Dell уже анонсировали выпуск таких серверов. Сейчас рынок таких серверов закрыт для проприетарного ПО, подавляющая часть которого написана и скомпилирована для архитектуры x86.

Возможности для инвестиций

Фирма ищет стратегического инвестора для продуктизации технологии двоичной трансляции и выхода на международный рынок.

Вакансии [Oct 15, 2012]

Компания Elbrus Technologies, резидент инновационного Фонда «Сколково», приглашает в свою команду опытного и амбициозного разработчика на должность …

Контакты [Aug 31, 2012]

г. Москва, ул. Вавилова д.24 (10 минут пешком от м. Ленинский проспект)

Телефон: +74991351475

e-mail: info(at)eltechs.com

image

 

on sk.ru:/Network / Сообщество /ООО “Эльбрус Технологии” /

image eltechs.com    Новости компании   Москва 
Потребности 100 млн.руб.     Соинвестиции  9.6 млн.руб.  [ Потребности $3.125M  Соинвестиции $0.3M]
Грант получен

Проект

Эльбрус Технологии – молодой и энергичный стартап, фокусирующийся на высокотехнологичных программных проектах. Наша цель – создавать продукты превосходного технического качества, способные повлиять на развитие индустрии ИТ в целом.

Рынок

Наш рынок – облачные сервисы, дата-центры и кластеры, построенные на новейших серверах с архитектурой ARM. Компании Hewlett Packard и Dell уже анонсировали выпуск таких серверов. Сейчас рынок таких серверов закрыт для проприетарного ПО, подавляющая… дальше

Компания

Технология программного переноса двоичных кодов с архитектуры x86 на архитектуру ARM.

Возможности для инвестиций

Фирма ищет стратегического инвестора для продуктизации технологии двоичной трансляции и выхода на международный рынок

Команда [4]

Вадим Гимпельсон   Вадим Гимпельсон, Генеральный директор

Максим Маслов  Максим Маслов, Технический директор

Анатолий Конухов  Анатолий Конухов, Директор по развитию

     Наша “суровая” команда 🙂

Новости [19]

20.11.2012 10:01 от Elbrus Technologies
The World’s First 130 Watt Server Cluster

Gina Longoria Oct 25, 2012 Calxeda’s approach to driving power optimization in the datacenter goes well beyond the processor. We focus on enabling our partners to achieve rack level power…
19.11.2012 15:58 от Elbrus Technologies

Dell wants to tune big data apps for ARM servers

Derrick Harris Oct 24, 2012 Dell is donating an ARM-based server to the Apache Software Foundation so contributors can test their projects on new, energy-efficient hardware architectures…
19.11.2012 14:43 от Elbrus Technologies

Calxeda roadmap leads to 64-bit CPU in 2014

Rick Merritt Oct 17, 2012 SAN JOSE, Calif.–Startup Calxeda has disclosed its two-year road map including its first 64-bit chip just over a week before ARM TechCon, when competitors are expected…
23.10.2012 16:09 от Elbrus Technologies

Russian Startup Working on Intel to ARM Software Emulator

Elbrus Technologies in Moscow is developing an x86 to ARM binary translator for use on ARM-powered servers Lucian Constantin Oct 09, 2012 IDG News Service — A Russian startup company called…

22.10.2012 20:04 от Elbrus Technologies
ARM: природа серверов меняется

Джек Кларк 21.09.2012 Изменения, связанные с переходом к облачным вычислениям, уже оказали огромное влияние на подходы к конструированию серверов, и они же могут оказаться решающим фактором, который…
7.10.2012 0:07 от Elbrus Technologies

ARM gets weapon in server battle vs. Intel

Rick Merritt Oct 2, 2012 AN JOSE, Calif. – Russian engineers are developing software to run x86 programs on ARM-based servers. If successful, the software could help lower one of the biggest…

4.10.2012 12:31 от Elbrus Technologies
ARM может получить козырь в борьбе с Intel благодаря российским разработчикам

Российские инженеры из стартапа «Эльбрус Технологии» работают над созданием двоичного транслятора, позволяющего запускать приложения для традиционных настольных и серверных процессоров x86 от Intel или AMD на энергоэффективных чипах с архитектурой ARM без необходимости перекомпиляции. Цель проекта — сделать чипы ARM более привлекательными…
3.10.2012 16:51 от Elbrus Technologies

Applied Micro’s X-Gene server chip ARMed to the teeth

Aug 30, 2012 Ready to take a bite out of x86 servers and Cisco Hot Chips An opportunity to define the future of server processing comes along once every decade or so, and Applied Micro Circuit, a company known for its networking chips and PowerPC-based embedded controllers, wants to move up into the big leagues to take on Intel, Advanced Micro…

26.9.2012 12:29 от Elbrus Technologies
Nvidia Develops High-Performance ARM-Based “Boulder” Microprocessor – Report

Nvidia Reportedly Preps Competitor for AMD Opteron and Intel Xeon Processor for Servers Anton Shilov Sep 21, 2012 Nvidia Corp. is reportedly working on an ultra high-performance system-on-chip based on ARM architecture, which would challenge AMD Opteron and Intel Xeon microprocessors in the server space. The chip is called project Boulder and…

31.8.2012 18:09 от Elbrus Technologies
Reshape Next Generation Cloud and Data Centers

Project Thunder is a family of highly integrated, multi-core SoC processors that will incorporate highly optimized, full custom cores built from the ground up based on 64-bit ARMv8 Instruction Set Architecture (ISA) into an innovative system-on-chip (SoC) that will redefine features, performance, power and cost metrics for the next-generation cloud…

31.8.2012 18:01 от Elbrus Technologies
The Baserock™ Slab. Highly optimized for use with Baserock Embedded Linux system development software

The Baserock™ Slab is a multi-processor server featuring 8 quad-core ARMv7-A CPUs running at 1.33GHz and an on-board high-speed network switch fabric with 5Gbit/s between the CPUs and 2x10Gbit/s external. Each compute node gets additional performance with its own dedicated low-latency mSATA solid state drive. The Slab is designed to deliver…

31.8.2012 17:49 от Elbrus Technologies
Boston Viridis – ARM® Microservers. A server that only uses 5 watts of power!

The Boston Viridis uses the ARM® based Calxeda EnergyCore™ SoCs (Server on Chip) to create a rack mountable 2U server cluster comprising 192 processing cores leading the way towards energy efficient hyperscale computing. The Boston Viridis is a self contained, highly extensible, 48 node ultra-low power ARM® cluster with integral high…
27.8.2012 20:34 от Elbrus Technologies

ARM rides open cloud computing testbed

Rick Merritt July 18, 2012. SAN JOSE – A handful of vendors have created a trial version for ARM-based servers of the OpenStack cloud computing software now available for testing online. The open source offering fills in another small piece of software puzzle for the low power architecture working its way into the data center. ARM server…

27.8.2012 20:23 от Elbrus Technologies
ARM signs 64-bit deal with Cavium

Peter Clarke Aug 1, 2012. LONDON – Fabless networking chip firm Cavium Inc. (San Jose, Calif.) has announced that it is planning to deliver a family of multicore system-chips based on full custom cores designed to implement the 64-bit ARMv8 instruction set architecture from ARM Holdings plc (Cambridge, England), The chips will be aimed at…
27.8.2012 20:09 от Elbrus Technologies

Samsung plans ARM-based CPU for servers, says report

Peter Clarke Aug 6, 2012. LONDON – Samsung Electronics Co. Ltd. is planning to introduce an ARM-based CPU for server applications in 2014, according to a Seoul Economic Daily report in Korean. Intel currently holds 90 percent of the market for server processors, the report said. Samsung is planning to introduce a very low-power processor…
14.7.2012 12:33 от Константин Трушкин

ARM and X86 Could Coexist in Data Centers, Says Calxeda

Jun 19, 2012 ARM processors could potentially coexist with x86 processors from Intel or Advanced Micro Devices in server environments, with the use case being similar to CPUs and graphics processors in some supercomputers today, chip maker Calxeda said on Monday. In hybrid server environments x86 processors could do the main processing, while…

14.7.2012 12:17 от Константин Трушкин
ARM: Two Licenses for Server Processors Signed

ARM Signs ARMv8/Atlas, Cortex-A15 Licenses for Server Chips April 23, 2012 ARM Holdings, a leading developer of microprocessor technologies for low-power applications, said late on Monday that it has signed two licenses for its intellectual property for use in servers. One undisclosed company has licensed ARMv8-based 64-bit code-named Atlas…
14.7.2012 12:08 от Константин Трушкин

ARM Will Impact Servers in 2014, CEO Says

Jan 18, 2012 ARM hopes for a serious impact on the server market starting in 2014 when its 64-bit processor design reaches the market, CEO Warren East said. Server makers have announced experimental systems with low-power ARM processors, which is a big confidence booster for the company, East said during an interview at the Consumer Electronics…
14.7.2012 11:08 от Константин Трушкин

Copper enables the ARM server ecosystem

Dell drives innovation for the ARM server ecosystem Enterprises that run large web, cloud and big data environments are constantly seeking new technology to gain competitive advantage and reduce operations cost. This focus is motivating a dramatic interest in ARM-based server technologies as a way to meet these requirements. What is ARM? An…
 

Сколково (инновационный центр) [Википедия, 13 декабря 2012]

Инновационный центр «Сколково»[1]Российская Кремниевая долина»)[2][3]) — строящийся в Подмосковье современный научно-технологический инновационный комплекс по разработке и коммерциализации новых технологий, первый в постсоветское время в России строящийся “с нуля” наукоград. В комплексе будут обеспечены особые экономические условия для компаний, работающих в приоритетных отраслях модернизации экономики России: телекоммуникации и космос, биомедицинские технологии, энергоэффективность, информационные технологии , а также ядерные технологии.[2]. Федеральный закон Российской Федерации N 244-ФЗ «Об инновационном центре „Сколково“» был подписан президентом Российской Федерации Д. А. Медведевым 28 сентября 2010 г.[1].

Комплекс первоначально располагался на территории городского поселения Новоивановское, вблизи деревни Сколково, в восточной части Одинцовского района Московской области, к западу от МКАД на Сколковском шоссе. Территория инновационного центра «Сколково» вошла в состав Москвы (район Можайский Западного административного округа) с 1 июля 2012 года.[4].

На территории площадью около 400 гектаров будут проживать примерно 21 тысяча человек, ещё 21 тысяча будет ежедневно приезжать в инновационный центр на работу [5]. Первое здание “Гиперкуб” уже готово. Объекты первой очереди “иннограда” будут введены в эксплуатацию уже к 2014 году, полностью строительство объектов будет завершено к 2020 году[ист

Кластер информационных и компьютерных технологий

Самым крупным кластером Сколково является кластер информационных и компьютерных технологий. Частью IT-кластера стали уже 209 компаний (на 15 августа 2012).[69]

Участники кластера работают над созданием нового поколения мультимедийных поисковых систем, эффективных систем информационной безопасности. Активно идет внедрение инновационных IT-решений в образование, здравоохранение. Реализуются проекты по созданию новых технологий по передаче (оптоинформатика, фотоника) и хранению информации. Ведется разработка мобильных приложений, аналитического программного обеспечения, в том числе для финансовой и банковской сфер. Проектирование беспроводных сенсорных сетей — ещё одно важное направление деятельности компаний-участников кластера.[70]

Международное сотрудничество

Одним из важнейших элементов деятельности Сколково является международное сотрудничество. Среди партнеров проекта значатся исследовательские центры, университеты, а также крупные международные корпорации. Большинство зарубежных компаний планирует в скором времени разместить в Сколково свои центры.

  • Финляндия: Nokia Siemens Networks.
  • Германия: Siemens, SAP.
  • Швейцария: швейцарский технопарк Technopark Zurich.
  • Соединенные Штаты Америки: Microsoft, Boeing, Intel, Cisco, Dow Chemical, IBM.
  • Швеция: Ericsson.
  • Франция: Alstom.
  • Нидерланды: EADS.
  • Австрия: Вексельбергом и министром транспорта, инноваций и технологий Австрии Дорис Бурес в Вене было подписано соглашение, предполагающее поддержку российских и австрийских компаний, специализирующихся на исследовательской деятельности, развитии технологий и инноваций.
  • Индия: был подписан меморандум между Фондом «Сколково» и корпорацией Tata Group о возможности привлечения индийской компании Tata Sons Limited к реализации проектов на базе Сколково в таких областях, как средства связи и информационные технологии, инжиниринг, химия, энергетика[77].
  • Италия: достигнуты договоренности по взаимному обмену студентами между вузами двух стран. Также итальянских профессоров и преподавателей будут приглашать для чтения лекций в российских университетах и университетах Сколково, и для совместной разработки научных и образовательных программ.
  • Южная Корея: Вексельбергом и президентом Научно-исследовательского института электроники и телекоммуникаций Республики Корея был подписан с меморандум о взаимопонимании[78].

Отсутствие спроса на инновации

По мнению научного руководителя Инновационного института при МФТИ Юрия Аммосова, в условиях, когда в России отсутствует спрос на инновации, созданные в «кремниевой долине» инновации не смогут вывести российскую экономику на инновационный путь развития[97]. Игорь Николаев из компании ФБК придерживается той же позиции[98][99].

Отдельные критики считают, что российские компании не озабочены покупкой и внедрением новых технологий, потому что нацелены не на рост оборота, а на получение высокой маржи: «Конкуренция идет не за потребителя, а за доступ к ресурсам, и до тех пор пока ситуация не переломится, на инновации спроса не будет»[100]

Результаты работы

  • Общее число резидентов проекта на август 2012 года составило 583 компании.
  • С начала работы Фонда одобрено 105 грантов на общую сумму 6 397 млн руб. [$200M as of August, 2012]
  • , в том числе за период с 1 января по 30 апреля 2012 года22 гранта на сумму 597 млн руб. [$18.7M as of August, 2012]

Коммерциализация результатов исследовательской деятельности

  • Создание опытного образца маневрового тепловоза с асинхронным интеллектуальным гибридным приводом «SinaraHybrid» (ТЭМ-9Н). Cумма гранта 35 млн руб. план продаж 8,4 млрд руб.
  • Создание первого в мире интерактивного безэкранного(воздушного) дисплея Displair. На данный момент разработана бета-версия. Начало продаж — конец 2012 года
Advertisements

3 Comments

  1. […] 28nm foundry capacity [this same ‘Experiencing the ‘Cloud’ blog, July 27 – Nov 13, 2012]- Intel targeting ARM based microservers: the Calxeda case [this same ‘Experiencing the ‘Cloud’ blog, Dec 14, 2012]- Intel’s biggest flop: at least […]

  2. […] April 8, 2013] which is best to start with for its simple and efficient message, as well as what Intel targeting ARM based microservers: the Calxeda case [‘Experiencing the Cloud’ blog, Dec 14, 2012] already contained on this blog […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: