… as one of the crucial issues for that (in addition to the cloud, mobility and Internet-of-Things), via the current tipping point as per Microsoft, and the upcoming revolution in that as per Intel
Satya Nadella, Cloud & Enterprise Group, Microsoft and Om Malik, Founder & Senior Writer, GigaOM [LeWeb YouTube channel, Dec 10, 2013]
And why I will present Big Data after that? For very simple reason: IMHO exactly in Big Data Microsoft’s innovations came to a point at which its technology has the best chances to become dominant and subsequently define the standard for the IT industry—resulting in “winner-take-all” economies of scale and scope. Whatever Intel is going to add to that in terms of “technologies for the next Big Data revolution” is going only to help Microsoft with its currently achieved innovative position even more. But for this reason I will include here the upcoming Intel innovations for Big Data as well.
In this next-gen regard it is highly recommended to read also: Disaggregation in the next-generation datacenter and HP’s Moonshot approach for the upcoming HP CloudSystem “private cloud in-a-box” with the promised HP Cloud OS based on the 4 years old OpenStack effort with others [‘Experiencing the Cloud’. Dec 12, 2013] !
Now the detailed discussion of Big Data:
The Garage Series: Unleashing Power BI for Office 365 [Office 365 technology blog, Nov 20, 2013]
In this week’s show, host Jeremy Chapman is joined by Michael Tejedor from the SQL Server team to discuss Power BI and show it in action. Power BI for Office 365 is a cloud based solution that reduces the barriers to deploying a self-service Business Intelligence environment for sharing live Excel based reports and data queries as well as new features and services that enable ease of data discover and information access from anywhere. Michael draws up the self-service approach to Power BI as well as how public data can be queried and combined in a unified view within Excel. Then they walk through an end-to-end demo of Excel and Power BI components—Power Query [formerly known as “Data Explorer“], Power Pivot, Power View, Power Map [formerly known as product codename “Geoflow“] and Q&A–as they optimize profitability of a bar and rein in bartenders with data.
Last week Mark Kashman and I went through the administrative controls of managing user access and mobile devices, but this week I’m joined by Michael Tejedor and we shift gears completely to talk data, databases and business intelligence. Back in July we announced Power BI for Office 365 and how this new service along with the using the familiar tools within Excel, enables you can to discover, analyze, visualize and share data in powerful ways. Power BIThe solution includes Power Query, Power Pivot, Power View, Power Map and as well as a host of Power BI features including Q&A. and how using the familiar tools within Excel, you can discover, analyze, visualize and share data in powerful ways. Power BI includes Power Query, Power Pivot, Power View, Power Map and Q&A.
- Power Query [formerly known as “Data Explorer“] is a data search engine allowing you to query data from within your company and from external data sources on the Internet, all within Excel.
- Power Pivot lets you create flexible models within Excel that can process large data sets quickly using SQL Server’s in-memory database.
- Power View allows you to manipulate data and compile it into charts, graphs and other visualizations. It’s great for presentations and reports
- Power Map [formerly known as product codename “Geoflow“] is a 3D data visualization tool for mapping, exploring and interacting with geographic and temporal data.
- Q&A is a natural language query engine that lets users easily query data using common terms and phrases.
In many cases, the process to get custom reports and dashboards from the people running your databases, sales or operations systems is something like submitting a request to your database administrator and a few phone calls or meetings to get what you want. I came from an logistics and operations management background, it could easily take 2 or 3 weeks to even make minor tweaks to an operational dashboard. Now you can use something familiar–Excel—in a self-service way to hook into your local databases, Excel flat files, modern data sources like Hadoop or public data sources via Power Query and the data catalogue. All of these data sources can be combined create powerful insights and data visualizations, all can be easily and securely shared with the people you work with through the Power BI for Office 365 service.
Of course all of this sounds great, but you can’t really get a feel for it until you see it. Michael and team built out a great demo themed after a bar and using data to track alcohol profitability, pour precision per bartender and Q&A to query all of this using normal query terms. You’ll want to watch the show to see how everything turns out and of course to see all of these power tools in action. Of course if you want to kick the tires and try Power BI for Office 365, you can register for the preview now.
Intel: technologies for the next Big Data revolution [HP Discover YouTube channel, recorded on Dec 11; published on Dec 12, 2013]
Related “current tipping point” announcements from Microsoft:
From: Organizations Speed Business Results With New Appliances From HP and Microsoft [joint press release, Jan 18, 2011]
New solutions for business intelligence, data warehouse, messaging and database consolidation help increase employee productivity and reduce IT complexity.
… The HP Business Decision Appliance is available now to run business intelligence services ….
Delivering on the companies’ extended partnership announced a year ago, the new converged application appliances from HP and Microsoft are the industry’s first systems designed for IT, as well as end users. They deliver application services such as business intelligence, data warehousing, online transaction processing and messaging. The jointly engineered appliances, and related consulting and support services, enable IT to deliver critical business applications in as little as one hour, compared with potentially months needed for traditional systems.3 One of the solutions already offered by HP and Microsoft — the HP Enterprise Data Warehouse Appliance — delivers up to 200 times faster queries and 10 times the scalability of traditional Microsoft SQL Server deployments.4
With the HP Business Decision Appliance, HP and Microsoft have greatly reduced the time and effort it takes for IT to configure, deploy and manage a comprehensive business intelligence solution, compared with a traditional business intelligence solution where applications, infrastructure and productivity tools are not pre-integrated. This appliance is optimized for Microsoft SQL Server and Microsoft SharePoint and can be installed and configured by IT in less than one hour.
The solution enables end users to share data analyses built with Microsoft’s award-winning5 PowerPivot for Excel 2010 and collaborate with others in SharePoint 2010. It allows IT to centrally audit, monitor and manage solutions created by end users from a single dashboard.
Availability and Pricing6
The HP Business Decision Appliance with three years of HP 24×7 hardware and software support services is available today from HP and HP/Microsoft Frontline channel partners for less than $28,000 (ERP). Microsoft SQL Server 2008 R2 and Microsoft SharePoint 2010 are licensed separately.
The HP Enterprise Data Warehouse Appliance with services for site assessment, installation and startup, as well as three years of HP 24×7 hardware and software support services, is available today from HP and HP/Microsoft Frontline channel partners starting at less than $2 million. Microsoft SQL Server 2008 R2 Parallel Data Warehouse is licensed separately.
3 Based on HP’s experience with customers using HP Business Decision Appliance.
4 SQL Server Parallel Data Warehouse (PDW) has been evaluated by 16 early adopter customers in six different industries. Customers compared PDW with their existing environments and saw typically 40x and up to 200x improvement in query times.
5 Messaging and Online Collaboration Reviews, Nov. 30, 2010, eWEEK.com.
6 Estimated retail U.S. prices. Actual prices may vary.
From: HP Delivers Enterprise Agility with New Converged Infrastructure Solutions [press release, June 6, 2011]
HP today announced several industry-first Converged Infrastructure solutions that improve enterprise agility by simplifying deployment and speeding IT delivery.
Converged Systems accelerate time to application value
HP Converged Systems speed solution deployment by providing a common architecture, management and security model across virtualization, cloud and dedicated application environments. They include:
- HP AppSystem maximizes performance while simplifying deployment and application management. These systems offer best practice operations with a standard architecture that lowers total cost of ownership. Among the new systems are HP Vertica Analytics System, as well as HP Database Consolidation Solution and HP Business Data Warehouse Appliance, which are both optimized for Microsoft SQL Server 2008 R2.
From: Microsoft Expands Data Platform With SQL Server 2012, New Investments for Managing Any Data, Any Size, Anywhere [press release, Oct 12, 2011]
New technologies will give businesses a universal platform for data management, access and collaboration.
… Kummert described how SQL Server 2012, formerly code-named “Denali,” addresses the growing challenges of data and device proliferation by enabling customers to rapidly unlock and extend business insights, both in traditional datacenters and through public and private clouds. Extending on this foundation, Kummert also announced new investments to help customers manage “big data,” including an Apache Hadoop-based distribution for Windows Server and Windows Azure and a strategic partnership with Hortonworks Inc. …
The company also made available final versions of the Hadoop Connectors for SQL Server and Parallel Data Warehouse. Customers can use these connectors to integrate Hadoop with their existing SQL Server environments to better manage data across all types and forms.
SQL Server 2012 delivers a powerful new set of capabilities for mission-critical workloads, business intelligence and hybrid IT across traditional datacenters and private and public clouds. Features such as Power View (formerly Project “Crescent,”) and SQL Server Data Tools (formerly “Juneau”) expand the self-service BI capabilities delivered with PowerPivot, and provide an integrated development environment for SQL Server developers.
From: Microsoft Releases SQL Server 2012 to Help Customers Manage “Any Data, Any Size, Anywhere” [press release, March 6, 2012]
Microsoft’s next-generation data platform releases to manufacturing today.
REDMOND, Wash. — March 6, 2012 — Microsoft Corp. today announced that the latest version of the world’s most widely deployed data platform, Microsoft SQL Server 2012, has released to manufacturing. SQL Server 2012 helps address the challenges of increasing data volumes by rapidly turning data into actionable business insights. Expanding on Microsoft’s commitment to help customers manage any data, regardless of size, both on-premises and in the cloud, the company today also disclosed additional details regarding its plans to release an Apache Hadoop-based service for Windows Azure.
Tackling Big Data
IT research firm Gartner estimates that the volume of global data is growing at a rate of 59 percent per year, with 70 to 85 percent in unstructured form.* Furthering its commitment to connect SQL Server and rich business intelligence tools, such as Microsoft Excel, PowerPivot for Excel 2010 and Power View, with unstructured data, Microsoft announced plans to release an additional limited preview of an Apache Hadoop-based service for Windows Azure in the first half of 2012.
To help customers more cost-effectively manage their enterprise-scale workloads, Microsoft will release several new data warehousing solutions in conjunction with the general availability of SQL Server 2012, slated to begin April 1. This includes a major software update and new half-rack form factors for Microsoft Parallel Data Warehouse appliances, as well as availability of SQL Server Fast Track Data Warehouse reference architectures for SQL Server 2012.
Microsoft Simplifies Big Data for the Enterprise [press release, Oct 24, 2012]
New Apache Hadoop-compatible solutions for Windows Azure and Windows Server enable customers to easily extract insights from big data.
NEW YORK — Oct. 24, 2012 — Today at the O’Reilly Strata Conference + Hadoop World, Microsoft Corp. announced new previews of Windows Azure HDInsight Service and Microsoft HDInsight Server for Windows, the company’s Apache Hadoop-based solutions for Windows Azure and Windows Server. The new previews, available today athttp://www.microsoft.com/bigdata, deliver Apache Hadoop compatibility for the enterprise and simplify deployment of Hadoop-based solutions. In addition, delivering these capabilities on the Windows Server and Azure platforms enables customers to use the familiar tools of Excel, PowerPivot for Excel and Power View to easily extract actionable insights from the data.
“Big data should provide answers for business, not complexity for IT,” said David Campbell, technical fellow, Microsoft. “Providing Hadoop compatibility on Windows Server and Azure dramatically lowers the barriers to setup and deployment and enables customers to pull insights from any data, any size, on-premises or in the cloud.”
The company also announced today an expanded partnership with Hortonworks, a commercial vendor of Hadoop, to give customers access to an enterprise-ready distribution of Hadoop with the newly released solutions.
“Hortonworks is the only provider of Apache Hadoop that ensures a 100 percent open source platform,” said Rob Bearden, CEO of Hortonworks. “Our expanded partnership with Microsoft empowers customers to build and deploy on platforms that are fully compatible with Apache Hadoop.”
More information about today’s news and working with big data can be found at http://www.microsoft.com/bigdata.
Choose the Right Strategy to Reap Big Value From Big Data [feature article for the press, Nov 13, 2012]
From devices to storage to analytics, technologies that work together will be key for business’ next information age.
REDMOND, Wash. — Nov. 13, 2012 — It seems the gigabyte is going the way of the megabyte — another humble unit of computational measurement that is becoming less and less relevant. Long live the terabyte, impossibly large, increasingly common.
Consider this: Of all the data that’s been collected in the world, more than 90 percent has been gathered in the last two years alone. According to a June 2011 report from the McKinsey Global Institute, 15 out of 17 industry sectors of the U.S. have more data stored — per company — than the U.S. Library of Congress.
The explosion in data has been catalyzed by several factors. Social media sites such as Facebook and Twitter are creating huge streams of unstructured data in the form of opinions, comments, trends and demographics arising from a vast and growing worldwide conversation.
And then there’s the emerging world of machine-generated information. The rise of intelligent systems and the Internet of Things means that more and more specialized devices are connected to information technology — think of a national retail chain that is connected to every one of its point-of-sale terminals across thousands of locations or an automotive plant that can centrally monitor hundreds of robots on the shop floor.
Combine it all and some industry observers are predicting that the amount of data stored by organizations across industries will increase ten-fold every five years, much of it coming from new streams that haven’t yet been tapped.
It truly is a new information age, and the opportunity is huge. The McKinsey Global Instituteestimates that the U.S. health care system, for example, could save as much as $300 billion from more effective use of data. In Europe, public sector organizations alone stand to save 250 billion euros.
In the ever-competitive world of business, data strategy is becoming the next big competitive advantage. According to analyst firm Gartner Group,* “By tapping a continual stream of information from internal and external sources, businesses today have an endless array of new opportunities for: transforming decision-making; discovering new insights; optimizing the business; and innovating their industries.”
According to Microsoft’s Ted Kummert, corporate vice president of the Business Platforms Division, companies addressing this challenge today may wonder where to start. How do you know which data to store without knowing what you want to measure? But then again, how do you know what insights the data holds without having it in the first place?
“There is latent value in the data itself,” Kummert says. “The good news is storage costs are making it economical to store the data. But that still leaves the question of how to manage it and gain value from it to move your business forward.”
With new data services in the cloud such as Windows Azure HDInsight Service and Microsoft HDInsight Server for Windows and Microsoft’s Apache Hadoop-based solutions for Windows Azure and Windows Server, organizations can afford to capture valuable data streams now while they develop their strategy — without making a huge financial bet on a six-month, multimillion-dollar datacenter project.
Just having access to the data, says Kummert, can allow companies to start asking much more complicated questions, combining information sources such as geolocation or weather information with internal operational trends such as transaction volume.
“In the end, big data is not just about holding lots of information,” he says. “It’s about how you harness it. It’s about insight, allowing end users to get the answers they need and doing so with the tools they use every day, whether that’s desktop applications, devices at the network edge or something else.”
His point is often overlooked with all the abstract talk of big data. In the end, it’s still about people, so making it easier for information workers to shift to a new world in which data is paramount is just as important as the information itself. Information technology is great at providing answers, but it still doesn’t know how to ask the right questions, and that’s where having the right analytics tools and applications can help companies make the leap from simply storing mountains of data to actually working with it.
That’s why in the Windows 8 world, Kummert says, the platform is designed to extend from devices and phones to servers and services, allowing companies to build a cohesive data strategy from end to end with the ultimate goal of empowering workers.
“When we talk about the Microsoft big data platform, we have all of the components to achieve exactly that,” Kummert says. “From the Windows Embedded platform to the Microsoft SQL Server stack through to the Microsoft Office stack. We have all the components to collect the data, store it securely and make it easier for information workers to find it — and, more importantly, understand what it means.”
For more information on building intelligent systems to get the most out of business data, please visit the Windows Embedded home page.
* Gartner, “Gartner Says Big Data Creates Big Jobs: 4.4 Million IT Jobs Globally to Support Big Data By 2015,” October 2012
Which data management solution delivers against today’s top six requirements? [The HP Blog Hub, March 25, 2013]
By Manoj Suvarna – Director, Product Management, HP AppSystems
In my last post I talked about the six key requirements I believe a data management
solution should deliver against today, namely:
1. High performance
2. Fast time to value
3. Built with Big Data as a priority
4. Low cost
5. Simplified management
6. Proven expertise
Today, 25th March 2013, HP has announced the HP AppSystem for Microsoft SQL Server 2012 Parallel Data Warehouse, a comprehensive data warerehouse solution jointly engineered with Microsoft, with a wide array of complementary tools, to effectively manage, store, and unlock valuable business insights.
Let’s take a look at how the solution delivers against each of the key requirements in turn:
1 High performance
With its MPP (Massively Parallel Processing) engine, and ‘shared nothing’ architecture, to effectively manage, store, and unlock valuable business insights, the HP AppSystem for Parallel Data Warehouse can deliver linear scale starting from a configuration to support small terabyte requirements all the way up to configurations supporting six Petabytes of data.
The solution features the latest HP ProLiant Gen8 servers, with InfiniBand FDR networking, and uses the xVelocity in-memory analytics engine and the xVelocity memory-optimized columnstore index feature in Microsoft SQL Server 2012 to greatly enhance query performance.
The combination of Microsoft software with HP Converged Infrastructure means HP AppSystem for Parallel Data Warehouse offers leading performance for complex workloads, with up to 100x faster query performance and a 30% faster scan rate than previous generations.
2 Fast time to value
HP AppSystem for Parallel Data Warehouse is a factory built, turn-key system, delivered complete from HP’s factory as an integrated set of hardware and software including servers, storage, networking, tools, software, services, and support. Not only is the solution pre-integrated, but it’s backed by unique, collaborative HP and Microsoft support with onsite installation and deployment services to smooth implementation.
3 Built with Big Data as a priority
Designed to integrate with Hadoop, HP AppSystem for Parallel Data Warehouse is ideally suited for “Big Data” environments. This integration allows customers to perform comprehensive analytics on unstructured, semi-structured and structured data, to effectively gain business insights and make better, faster decisions.
4 Simplified management
Providing the optimal management environment has been a critical element of the design, and is delivered through HP Support Pack Utility Suite. This set of tools simplifies updates and several other maintenance tasks across the system to ensure that it is continually running at optimal performance. Unique in the industry, HP Support Pack Utility Suite can deliver up to 2000 firmware updates with the click of a button. In addition, the HP AppSystem for Parallel Data Warehouse is manageable via the Microsoft System Center console, leveraging deep integration with HP Insight Control.
5 Low cost
The HP AppSystem for Parallel Data Warehouse has been designed as part of an end to end stack for data management, integrating data warehousing seamlessly with BI solutions to minimize the cost of ownership.
It has also been re-designed with a new form factor to minimize space and maximize ease of expansion, which means the entry point for a quarter rack system is approximately 35% less expensive than the previous generation solution. It is expandable in modular increments up to 64 nodes, which means no need for the type of fork-lift upgrade that might be needed with a proprietary solution, and is targeted to be approximately half the cost per TB of comparable offerings in the market from Oracle, IBM, and EMC*.
6 Proven expertise
Together HP and Microsoft have over 30 years experience delivering integrated solutions from desktop to datacenter. HP AppSystem for Parallel Data Warehouse completes the portfolio for HP Data Management solutions, which give customers the ability to deliver insights on any data, of any size, combining best in class Microsoft software with HP Converged Infrastructure.
For customers, our ability to deliver on the requirements above ultimately provides agility for faster, lower risk deployment of data management in the enterprise, helping them make key business decisions more quickly and drive more value to the organization.
If you’d like to find out more, please go to www.hp.com/solutions/microsoft/pdw.
HP AppSystem for SQL 2012 Parallel Data Warehouse [HP product page, March 25, 2013]
Rapid time-to-value data warehouse solution
The HP AppSystem for Microsoft SQL Server 2012 Parallel Data Warehouse, jointly engineered, built and supported with Microsoft, is for customers who realize limitations and inefficiencies of their legacy data warehouse infrastructure. This converged system solution delivers significant advances over the previous generation solution including:
Enhanced performance and massive scalability
- Up to 100x faster query performance and a 30% faster scan rate
- Ability to start from small terabyte requirements that can linearly scale out to 6 Petabytes for mission critical needs
Minimize costs and management complexity
- Redesigned form factor minimizes space and allows ease of expansion with significant up-front acquisition savings as well as reduce OPEX heating, cooling and real estate cost requirements
- Appliance solution is pre-built and tested as a complete, end-to-end stack — easy to deploy and minimal technical resources required
- Extensive integration of Microsoft and 3rd party tools allow users to work with familiar tools like Excel as well as within heterogeneous BI environments
- Unique HP Support Pack Utility Suite set of tools significantly simplifies updates and other maintenance tasks to ensure system is running at optimal performance
Reduce risks and manage change
- Services delivered jointly under a unique collaborative support agreement, integrated across hardware and software, to help avoid IT disruptions and deliver faster resolution to issues
- Backed by more than 48,000 Microsoft professionals—with more than 12,000 Microsoft Certified—one of the largest, most specialized forces of consultants and support professionals for Microsoft environments in the world
PDW is a massively parallel processing data warehousing appliance built for any volume of relational data (with up to 100x performance gains) and provides the simplest integration to Hadoop.
Unlike other vendors who opt to provide their high-end appliances for a high price or provide a relational data warehouse appliance that is disconnected from their “Big Data” and/or BI offerings, Microsoft SQL Server Parallel Data Warehouse provides both a high-end massively parallel processing appliance that can improve your query response times up to 100x over legacy solutions as well as seamless integration to both Hadoop and with familiar business intelligence solutions. What’s more, it was engineered to lower ongoing costs resulting in a solution that has the lowest price/terabyte in the market.
What’s New in SQL Server 2012 Parallel Data Warehouse
- Up to 50x performance gains with the xVelocity updateable columnstore.
- Up to 100x performance gains over legacy warehouses.
- Seamless integration with “Big Data” using PolyBase.
- Multi-petabyte data capacity.
- 2.5x lower price per terabyte than SQL Server 2008 R2 PDW and lowest price per terabyte in industry.
Built For Big Data with PolyBase
SQL Server 2012 Parallel Data Warehouse introduces PolyBase, a fundamental breakthrough in data processing used to enable seamless integration between traditional data warehouses and “Big Data” deployments.
- Use standard SQL queries (instead of MapReduce) to access and join Hadoop data with relational data.
- Query Hadoop data without IT having to pre-load data first into the warehouse.
- Native Microsoft BI Integration allowing analysis of relational and non-relational data with familiar tools like Excel.
Next-Generation Performance at Scale
Scale and perform beyond your traditional SQL Server deployment with PDW’s massively parallel processing (MPP) appliance that can handle the extremes of your largest mission critical requirements of performance and scale.
- Up to 100x faster than legacy warehouses with xVelocity updateable columnstore.
- Massively Parallel Processing (MPP) architecture that parallelizes and distributes computing for high query concurrency and complexity.
- Rest assured with built-in hardware redundancies for fault tolerance.
- Rely on Microsoft as your single point of contact for hardware and software support.
Engineered For Optimal Value
Unlike other vendors in the data warehousing space who deliver a high-end appliance at a high price, Microsoft engineered PDW for optimal value by lowering the cost of the appliance.
- Resilient, scalable, and high performance storage features built into software lowering hardware costs.
- Compress data up to 15x with the xVelocity updateable columnstore saving up to 70% of storage requirements.
- Start small with a quarter rack allowing you to right-size the appliance rather than over-acquiring capacity.
- Use the same tools and knowledge as SQL Server without retaining new tools or knowledge for scale-out DW or Big Data.
- Co-engineered with hardware partners offering highest level of product integration and shipped to your door offering fastest time to value.
- The lowest price/terabyte than overall appliance market (and 2.5x lower than SQL 2008 R2 PDW).
PolyBase [Microsoft page, Feb 26, 2013]
PolyBase is a fundamental breakthrough in data processing used in SQL Server 2012 Parallel Data Warehouse to enable truly integrated query across Hadoop and relational data.
Complementing Microsoft’s overall Big Data strategy, PolyBase is a breakthrough new technology on the data processing engine in SQL Server 2012 Parallel Data Warehouse designed as the simplest way to combine non-relational data and traditional relational data in your analysis. While customers would normally burden IT to pre-populate the warehouse with Hadoop data or undergo an extensive training on MapReduce in order to query non-relational data, PolyBase does this all seamlessly giving you the benefits of “Big Data” without the complexities.
Unifies Relational and Non-relational Data
PolyBase is one of the most exciting technologies to emerge in recent times because it unifies the relational and non-relational worlds at the query level. Instead of learning a new query like MapReduce, customers can leverage what they already know (T-SQL)
- Integrated Query: Accepts a standard T-SQL query that joins tables containing a relational source with tables in a Hadoop cluster without needing to learn MapReduce.
- Advanced query options: Apart from simple SELECT queries, users can perform JOINs and GROUP BYs on data in the Hadoop cluster.
Enables In-place Queries with Familiar BI Tools
Microsoft Business Intelligence (BI) integration enables users to connect to PDW with familiar tools such as Microsoft Excel, to create compelling visualizations and make key business decisions from structured or unstructured data quickly.
- Integrated BI tools: End users can connect to both relational or Hadoop data with Excel abstracting the complexities of both.
- Interactive visualizations: Explore data residing in HDFS using Power View for immersive interactivity and visualizations.
- Query in-place: IT doesn’t have to pre-load or pre-move data from Hadoop into the data warehouse and pre-join the data before end users do the analysis.
Part of an Overall Microsoft Big Data Story
PolyBase is part of an overall Microsoft “Big Data” solution that already includes HDInsight (a 100% Apache Hadoop compatible distribution for Windows Server and Windows Azure), Microsoft Business Intelligence, and SQL Server 2012 Parallel Data Warehouse.
- Integrated with HDInsight: PolyBase can source the non-relational analysis from Microsoft’s 100% Apache compatible Hadoop distribution, HD Insights.
- Built into PDW: PolyBase is built into SQL Server 2012 Parallel Data Warehouse to bring “Big Data” benefits within the power of a traditional data warehouse.
- Integrated BI tools: PolyBase has native integration with familiar BI tools like Excel (through Power View and PowerPivot).
Announcing Power BI for Office 365 [Office News, July 8, 2013]
Today, at the Worldwide Partner Conference, we announced a new offering–Power BI for Office 365. Power BI for Office 365 is a cloud-based business intelligence (BI) solution that enables our customers to easily gain insights from their data, working within Excel to analyze and visualize the data in…
Yesterday during the Microsoft’s Worldwide Partner Conference we announced some exciting new Business Intelligence (BI) features available for Excel. Specifically, we announced the expansion of the BI offerings available as part of Power BI—a cloud-based BI solution that enables our customers to easily gain insights from their data, working within Excel to analyze and visualize the data in a self-service way.
Power BI for Office 365 now includes:
- Power Query, enabling customers to easily search and access public data and their organization’s data, all within Excel (formerly known as “Data Explorer“). Download details here.
- Power Map, a 3D data visualization tool for mapping, exploring and interacting with geographic and temporal data (formerly known as product codename “Geoflow“). Download details here.
- Power Pivot for creating and customizing flexible data models within Excel.
- Power View for creating interactive charts, graphs and other visual representations of data.
Clearing up some confusion around the Power BI “Release” [A.J. Mee’s Business Intelligence and Big Data Blog, Aug 13, 2013]
Hey folks. Thanks again for checking out my blog.
Yesterday (8/12/2013), Power BI received some attention from the press. Here’s one of the articles that I had seen talking about the “release” of Power BI:
Some of us inside Microsoft had to address all sorts of questions around this one. For the most part, the questions revolved around the *scope* of what was actually released. You have to remember that Power BI is a broad brand name that takes into account:
* Power Pivot/View/Query/Map (which is available now, for the most part)
* The Office 365 hosting of Power BI applications with cloud-to-on-premise data refresh, Natural Language query, data stewardship, etc..
* The Mobile BI app for Windows and iOS devices
Net-net: we announced the availability of the Mobile app (in preview form). At present, it is only available on Windows 8 devices (x86 or ARM) – no iOS just yet. The rest of the O365 / Power BI offering is yet to come. Check out this article to find out how to sign up.
So, the headline story is really all around the Mobile app. You can grab it today from the Store – just search on “Power BI” and it should be the first app that shows up.
We are announcing a significant update to Power Map Preview for Excel (formerly Project codename “GeoFlow” Preview for Excel) on the Microsoft Download Center. Just over five months ago, we launched the preview of Project codename “GeoFlow” amidst a passionately announced “tour” of global song artists through the years by Amir Netz (see 1:17:00 in the keynote) at the first ever PASS Business Analytics conference in April. The 3D visualization add-in has now become a centerpiece visualization (along with Power View) within the business intelligence capabilities of Microsoft Power BI in Excel, earning the new name Power Map to align with other Excel features (Power Query, Power Pivot, and Power View).
Information workers with their data in Excel have realized the potential of Power Map to identify insights in their geospatial and time-based data that traditional 2D charts cannot. Digital marketers can better target and time their campaigns while environmentally-conscious companies can fine-tune energy-saving programs across peak usage times. These are just a few of the examples of how location-based data is coming alive for customers using Power Map and distancing them from their competitors who are still staring blankly at a flat table, chart, or map. Feedback from customers like this lead us to introduce Power Map with some new features across experience of mapping data, discovering insights, and sharing stories.
From: Microsoft unleashes fall wave of enterprise cloud solutions [press release, Oct 7, 2013]
New Windows Server, System Center, Visual Studio, Windows Azure, Windows Intune, SQL Server, and Dynamics solutions will accelerate cloud benefits for customers.
REDMOND, Wash. — Oct. 7, 2013 — Microsoft Corp. on Monday announced a wave of new enterprise products and services to help companies seize the opportunities of cloud computing and overcome today’s top IT challenges. Complementing Office 365 and other services, these new offerings deliver on Microsoft’s enterprise cloud strategy.
Data platform and insights
As part of its vision to help more people unlock actionable insights from big data, Microsoft next week will release a second preview of SQL Server 2014. The new version offers industry-leading in-memory technologies at no additional cost, giving customers 10 times to 30 times performance improvements without application rewrites or new hardware. SQL Server 2014 also works with Windows Azure to give customers built-in cloud backup and disaster recovery.
For big data analytics, later this month Microsoft will release Windows Azure HDInsight Service, an Apache Hadoop-based service that works with SQL Server and widely used business intelligence tools, such as Microsoft Excel and Power BI for Office 365. With Power BI, people can combine private and public data in the cloud for rich visualizations and fast insights.
How to take full advantage of Power BI in Excel 2013 [News from Microsoft Business UK, Oct 14, 2013]
The launch of Power BI features in Excel 2013 gives users an added range of options for data analysis and gaining business intelligence (BI). Power Query, Power Pivot, Power View, and Power Map work seamlessly together, making it much simpler to discover and visualise data. And for small businesses looking to take advantage of self-service intelligence solutions, this is a major stride forwards.
With Power Query, users can search the entire cloud for data – both public and private. With access to multiple data sources, users can filter, shape, merge, and append the information, without the need to physically bring it in to Excel.
Once your query is shaped and filtered how you want it, you can download it into a worksheet in Excel, into the Data Model, or both. When you have the dataset you need, shaped and formed and properly merged, you can save the query that created it, and share it with other users.
Power Pivot enables users to create their own data models from various sources, structured to meet individual needs. You can customise, extend with calculations and hierarchies, and manage the powerful Data Model that is part of Excel.
The solution works seamlessly and automatically with Power Query, and with other features of Power BI, allowing you to manage and extend your own custom database in the familiar environment of Excel. The entire Data Model in Power Pivot – including tables, columns, calculations and hierarchies – exist as report-ready elements in Power View.
Power View allows users to create engaging, interactive, and insightful visualisations with just a few clicks of their mouse. The tool brings the Data Model alive, turning queries into visual analysis and answers. Data can be presented in a variety of different forms, with the reports easily shareable and open for interactive analysis.
A relatively new addition to Excel – Power Map is a geocentric and temporal mapping feature of Power BI. It brings location data into powerful, engaging 3D map visualisations. This allows users to create location-based reports, visualised over a time continuum, that tour the available data.
Using the features together
Power BI offers a collection of services which are designed to make self-service BI intuitive and collaborative. The solution combines the power and familiarity of Excel with collaboration and cloud-based functionality. This vastly increases users’ capacity to gather, manage and draw insights from data, ensuring they can make the most of business intelligence.
The various feature of BI can add value independently, but the real value is in integration. When used in conjunction with one another – rather than in silo – the services become more than the sum of their parts. They are designed to work seamlessly together in Excel 2013, supporting users as they look to find data, process it and create visualisations which add value to the decision making process.
Posted by Alex Boardman
Related upcoming technology announcements from Intel:
GraphBuilder: Revealing hidden structure within Big Data [Intel Labs blog, Dec 6, 2012]
By Ted Willke, Principal Engineer with Intel and the General Manager of the Graph Analytics Operation in Intel Labs.
Big Data. Big. Data. We hear the term frequently used to describe data of unusual size or generated at spectacular velocity, like the amount of social data that Facebook has amassed on us (30 PB in one cluster) or the rate at which sensors at the Large Hadron Collider collect information on subatomic particles (15 PB/year). And it’s often deemed “unstructured or semi-structured” to describe its lack of apparent, well, structure. What’s meant is that this data isn’t organized in a way that can directly answer questions, like a database can if you ask it how many widgets you sold last week.
But Big Data does have structure; it just needs to be discovered from within the raw text, images, video, sensor data, etc., that comprise it. And, companies, led by pioneers like Google, have been doing this for the better part of a decade, using applications that churn through the information using data-parallel processing and convenient frameworks for it, like Hadoop MapReduce. Their systems chop the incoming data into slices, farm it out to masses of machines, which subsequently filter it, order it, sum it, transform it, and do just about anything you’d want to do with it, within the practical limits of the readily available frameworks.
But until recently, only the wizards of Big Data were able to rapidly extract knowledge from a different type of structure within the data, a type that is best modeled by tree or graph structures. Imagine the pattern of hyperlinks connecting Wikipedia pages or the connections between Tweeters and Followers on Twitter. In these models, a line is drawn between two bits of information if they are related to each other in some way. The nature of the connection can be less obvious than in these examples and made specifically to serve a particular algorithm. For example, a popular form of machine learning called Latent Dirichlet Allocation (a mouthful, I know) can create “word clouds” of topics in a set of documents without being told the topics in advance. All it needs is a graph that connects word occurrences to the filenames. Another algorithm can accurately guess the type of noun (i.e., person, place, or thing) if given a graph that connects noun phrases to surrounding context phrases.
Many of these graphs are very large, with tens of billions of vertices (i.e., things being related) and hundreds of billions of edges (i.e., the relationships). And, many that model natural phenomena possess power-law degree distributions, meaning that many vertices connect to a handful of others, but a few may have edges to a substantial portion of the vertices. For instance, a graph of Twitter relationships would show that many people only have a few dozen followers while only a handful of celebrities have millions. This is all very problematic for parallel computation in general and MapReduce in particular. As a result, Carlos Guestrin and his crack team at the University of Washington in Seattle have developed a new framework, called GraphLab, that is specifically designed for graph-based parallel machine learning. In many cases, GraphLab can process such graphs 20-50X faster than Hadoop MapReduce. Learn more about their exciting work here.
Carlos is a member of the Intel Science and Technology Center for Cloud Computing, and we started working with him on graph-based machine learning and data mining challenges in 2011. Quickly it became clear that no one had a good story about how to construct large-scale graphs that frameworks like GraphLab could digest. His team was constantly writing scripts to construct different graphs from various unstructured data sources. These scripts ran on a single machine and would take a very long time to execute. Essentially, they were using a labor-intensive, low-performance method to feed information to their elegant high-performance GraphLab framework. This simply would not do.
Scanning the environment, we identified a more general hole in the open source ecosystem: A number of systems were out there to process, store, visualize, and mine graphs but, surprisingly, not to construct them from unstructured sources. So, we set out to develop a demo of a scalable graph construction library for Hadoop. Yes, for Hadoop. Hadoop is not good for graph-based machine learning but graph construction is another story. This work became GraphBuilder, which was demonstrated in July at the First GraphLab Workshop on Large-Scale Machine Learning and open sourced this week at 01.org (under Apache 2.0 licensing).
GraphBuilder not only constructs large-scale graphs fast but also offloads many of the complexities of graph construction, including graph formation, cleaning, compression, partitioning, and serialization. This makes it easy for just about anyone to build graphs for interesting research and commercial applications. In fact, GraphBuilder makes it possible for a Java programmer to build an internet-scale graph for PageRank in about 100 lines of code and a Wikipedia-sized graph for LDA in about 130.
This is only the beginning for GraphBuilder but it has already made a lot of connections. We will continually update it with new capabilities, so please try it out and let us know if you’d value something in particular. And, let us know if you’ve got an interesting graph problem for us to grind through. We are always looking for new revelations.
Intel, Facebook Collaborate on Future Data Center Rack Technologies [press release, Jan 16, 2013]
New Photonic Architecture Promises to Dramatically Change Next Decade of Disaggregated, Rack-Scale Server Designs
- Intel and Facebook* are collaborating to define the next generation of rack technologies that enables the disaggregation of compute, network and storage resources.
- Quanta Computer* unveiled a mechanical prototype of the rack architecture to show the total cost, design and reliability improvement potential of disaggregation.
- The mechanical prototype includes Intel Silicon Photonics Technology, distributed input/output using Intel Ethernet switch silicon, and supports the Intel® Xeon® processor and the next-generation system-on-chip Intel® Atom™ processor code named “Avoton.”
- Intel has moved its silicon photonics efforts beyond research and development, and the company has produced engineering samples that run at speeds of up to 100 gigabits per second (Gbps).
OPEN COMPUTE SUMMIT, Santa Clara, Calif., Jan. 16, 2013 – Intel Corporation announced a collaboration with Facebook* to define the next generation of rack technologies used to power the world’s largest data centers. As part of the collaboration, the companies also unveiled a mechanical prototype built by Quanta Computer* that includes Intel’s new, innovative photonic rack architecture to show the total cost, design and reliability improvement potential of a disaggregated rack environment.
“Intel and Facebook are collaborating on a new disaggregated, rack-scale server architecture that enables independent upgrading of compute, network and storage subsystems that will define the future of mega-datacenter designs for the next decade,” said Justin Rattner, Intel’s chief technology officer during his keynote address at Open Computer Summit in Santa Clara, Calif. “The disaggregated rack architecture [since renamed RSA (Rack Scale Architecture)] includes Intel’s new photonic architecture, based on high-bandwidth, 100Gbps Intel® Silicon Photonics Technology, that enables fewer cables, increased bandwidth, farther reach and extreme power efficiency compared to today’s copper based interconnects.”
Rattner explained that the new architecture is based on more than a decade’s worth of research to invent a family of silicon-based photonic devices, including lasers, modulators and detectors using low-cost silicon to fully integrate photonic devices of unprecedented speed and energy efficiency. Silicon photonics is a new approach to using light (photons) to move huge amounts of data at very high speeds with extremely low power over a thin optical fiber rather than using electrical signals over a copper cable. Intel has spent the past two years proving its silicon photonics technology was production-worthy, and has now produced engineering samples.
Silicon photonics made with inexpensive silicon rather than expensive and exotic optical materials provides a distinct cost advantage over older optical technologies in addition to providing greater speed, reliability and scalability benefits. Businesses with server farms or massive data centers could eliminate performance bottlenecks and ensure long-term upgradability while saving significant operational costs in space and energy.
Silicon Photonics and Disaggregation Efficiencies
Businesses with large data centers can significantly reduce capital expenditure by disaggregating or separating compute and storage resources in a server rack. Rack disaggregation refers to the separation of those resources that currently exist in a rack, including compute, storage, networking and power distribution into discrete modules. Traditionally, a server within a rack would each have its own group of resources. When disaggregated, resource types can be grouped together and distributed throughout the rack, improving upgradability, flexibility and reliability while lowering costs.
“We’re excited about the flexibility that these technologies can bring to hardware and how silicon photonics will enable us to interconnect these resources with less concern about their physical placement,” said Frank Frankovsky, chairman of the Open Compute Foundation and vice president of hardware design at supply chain at Facebook. “We’re confident that developing these technologies in the open and contributing them back to the Open Compute Project will yield an unprecedented pace of innovation, ultimately enabling the entire industry to close the utilization gap that exists with today’s systems designs.”
By separating critical components from one another, each computer resource can be upgraded on its own cadence without being coupled to the others. This provides increased lifespan for each resource and enables IT managers to replace just that resource instead of the entire system. This increased serviceability and flexibility drives improved total-cost for infrastructure investments as well as higher levels of resiliency. There are also thermal efficiency opportunities by allowing more optimal component placement within a rack.
The mechanical prototype is a demonstration of Intel’s photonic rack architecture for interconnecting the various resources, showing one of the ways compute, network and storage resources can be disaggregated within a rack. Intel will contribute a design for enabling a photonic receptacle to the Open Compute Project (OCP) and will work with Facebook*, Corning*, and others over time to standardize the design. The mechanical prototype includes distributed input/output (I/O) using Intel Ethernet switch silicon, and will support the Intel® Xeon® processor and the next generation, 22 nanometer system-on-chip (SoC) Intel® Atom™ processor, code named “Avoton” available this year.
The mechanical prototype shown today is the next evolution of rack disaggregation with separate distributed switching functions.
Intel and Facebook: A History of Collaboration and Contributions
Intel and Facebook have long been technology collaboration partners on hardware and software optimizations to drive more efficiency and scale for Facebook data centers. Intel is also a founding board member of the OCP, along with Facebook. Intel has several OCP engagements in flight including working with the industry to design OCP boards for Intel Xeon and Intel Atom based processors, support for cold storage with the Intel Atom processor, and common hardware management as well as future rack definitions including enabling today’s photonics receptacle.
Disruptive technologies to unlock the power of Big Data [Intel Labs blog, Feb 26, 2013]
By Ted Willke, Principal Engineer with Intel and the General Manager of the Graph Analytics Operation in Intel Labs.
This week’s announcement by Intel that it’s expanding the availability of the Intel® Distribution for Apache Hadoop* software to the US market is seriously exciting for the employees of this semiconductor giant, especially researchers like me. Why? Why would I say this given the amount of overexposure that Hadoop has received? I mean, isn’t this technology nearly 10 years old already??!! Well, because the only thing I hear more than people touting Hadoop’s promise are people venting frustration in implementing it. Rest assured that Intel is listening. We get that users don’t want to make a career out of configuring Hadoop… debugging it… managing it… and trying to figure out why the “insight” it’s supposed to be delivering often looks like meaningless noise.
Which brings me back to why this is a seriously exciting event for me. With our product teams doing the heavy lifting of making the Hadoop framework less rigid and easier to use while keeping it inexpensive, Intel Labs gets a landing zone for some cool disruptive technologies. In December, I blogged about the launch of our open source scalable graph construction library for Hadoop, called Intel® Graph Builder for Apache Hadoop software (f.k.a. GraphBuilder), and explained how it makes it easy to construct large scale graphs for machine learning and data mining. These structures can yield insights from relationships hidden within a wide range of big data sources, from social media and business analytics to medicine and e-science. Today I’ll delve a bit more into Graph Builder technology and introduce the Intel® Active Tuner for Apache Hadoop software, an auto-tuner that uses Artificial Intelligence (AI) to configure Hadoop for optimal performance. Both technologies will be available in the Intel Distribution.
So, Intel® Graph Builder leverages Hadoop MapReduce to turn large unstructured (or semi-structured) datasets into structured output in graph form. This kind of graph may be mined using graph search of the sort that Facebook recently announced. Many companies would like construct such graphs out of unstructured datasets and Graph Builder makes it possible. Beyond search, analysis may be applied to an entire graph to answer questions of the type shown in the figure below. The analysis may be performed using distributed algorithms implemented in frameworks like GraphLab, which I also discussed in my previous post.
Intel® Graph Builder performs extract, transform, and load operations, terms borrowed from databases and data warehousing. And, it does so at Hadoop MapReduce scale. Text is parsed and tokenized to extract interesting features. These operations are described in a short map-reduce program written by the data scientist. This program also defines when two vertices (i.e., features) in the graph are related by an edge. The rule is applied repeatedly to form the graph’s topology (i.e., the pattern of edge relationships between vertices), which is stored via the library. In addition, most applications require that additional tabulated information, or “network information,” be associated with each vertex/edge and the library provides a number of distributed algorithms for these tabulations.
At this point, we have a large-scale graph ready for HDFS, HBase, or another distributed store. But we need to do a few more things to ensure that queries and computations on the graph will scale up nicely, like:
- Cleaning the graph’s structure and checking that it is reasonable
- Compressing the graph and network information to conserve cluster resources
- Partitioning the graph in a way that will minimize cluster communications while load balancing computational effort
The Intel Graph Builder library provides efficient distributed algorithms for all of the above, and more, so that data scientists can spend more of their time analyzing data and less of their time preparing it. Enough said. The library will be included in the Intel Distribution shortly and we look forward to your feedback. We are constantly on the hunt for new features as we look to the future of big data.
Whereas Intel® Graph Builder was developed to simplify the programming of emerging applications, Intel® Active Tuner was developed to simplify the deployment of today’s applications by automating the selection of configuration settings that will result in optimal cluster performance. In fact, we initially codenamed this technology “Gunther,” after a well-known circus elephant trainer, because of its ability to train Hadoop to run faster . It’s cruelty-free to boot, I promise. Anyway, many Hadoop configuration parameters need to be tuned for the characteristics of each particular application, such as web search, medical image analysis, audio feature analysis, fraud detection, semantic analysis, etc. This tuning significantly reduces both job execution and query time but is time consuming and requires domain expertise. If you use Hadoop you know that the common practice is to tune it up using rule-of-thumb settings published by industry leaders. But these recommendations are too general and fail to capture the specific requirements of a given application and cluster resource constraints. Enter the Active Tuner.
Intel® Active Tuner implements a search engine that uses a small number of representative jobs to identify the best configuration from among millions or billions of possible Hadoop configurations. It uses a form of AI known as a genetic algorithm to search out the best settings for the number of maps, buffer sizes, compression settings, etc., constantly striving to derive better settings by combining those from pairs of trials that show the most promise (this is where the genetic part comes in) and deriving future trials from these new combinations. And, the Active Tuner can do this faster and more effectively than a human can using the rules-of-thumb. It can be controlled from a slick GUI in the new Intel Manager for Apache Hadoop, so take it for a test run when you pick up a copy of the Intel Distribution. You may see your cluster performance improve by up to 30% without any hassle.
To wrap, these are one-of-a-kind technologies that I think you’ll have fun playing with. And, despite offering quite a lot, Intel® Graph Builder and Intel® Active Tuner are just the beginning. I am very excited by what’s coming next. Intel is moving to unlock the power of Big Data and Intel Labs is preparing to blow it wide open.
*Other names and brands may be claimed as the property of others
Intel Unveils New Technologies for Efficient Cloud Datacenters [press release, Sept 4, 2013]
From New SoCs to Optical Fiber, Intel Delivers Cloud-Optimized Innovations Across Network, Storage, Microservers, and Rack Designs
- The Intel® Atom™ C2000 processor family is the first based on Silvermont micro-architecture, has 13 customized configurations and is aimed at microservers, entry-level networking and cold storage.
- New 64-bit, system-on-chip family for the datacenter delivers up to six times1 the energy efficiency and up to seven times2 the performance compared to previous generation.
- The first live demonstration of a Rack Scale Architecture-based system with high-speed Intel® Silicon Photonics components including a new MXC connector and ClearCurve* optical fiber developed in collaboration with Corning*, enabling data transfers speeds up to 1.6 terabits4 per second at distances up to 300 meters5 for greater rack density.
SAN FRANCISCO, Calif., September 4, 2013 – Intel Corporation today introduced a portfolio of datacenter products and technologies for cloud service providers looking to drive greater efficiency and flexibility into their infrastructure to support a growing demand for new services and future innovation.
Server, network and storage infrastructure is evolving to better suit an increasingly diverse set of lightweight workloads, creating the emergence of microserver, cold storage and entry networking segments. By optimizing technologies for specific workloads, Intel will help cloud providers significantly increase utilization, drive down costs and provide compelling and consistent experiences to consumers and businesses.
The portfolio includes the second generation 64-bit Intel® Atom™ C2000 product family of system-on-chip (SoC) designs for microservers and cold storage platforms (code named “Avoton”) and for entry networking platforms (code named “Rangeley”). These new SoCs are the company’s first products based on the Silvermont micro-architecture, the new design in its leading 22nm Tri-Gate SoC process delivering significant increases in performance and energy efficiency, and arrives only nine months after the previous generation.
“As the world becomes more and more mobile, the pressure to support billions of devices and users is changing the very composition of datacenters,” said Diane Bryant, senior vice president and general manager of the Datacenter and Connected Systems Group at Intel. “From leadership in silicon and SoC design to rack architecture and software enabling, Intel is providing the key innovations that original equipment manufacturers, telecommunications equipment makers and cloud service providers require to build the datacenters of the future.”
Intel also introduced the Intel® Ethernet Switch FM5224 silicon which, when combined with the WindRiver Open Network Software suite, brings Software Defined Networking (SDN) solutions to servers for improved density and lower power.
Intel also demonstrated the first operational Intel Rack Scale Architecture (RSA)-based rack with Intel® Silicon PhotonicsTechnology in combination with the disclosure of a new MXC connector and ClearCurve* optical fiber developed by Corning* with requirements from Intel. This demonstration highlights the speed with which Intel and the industry are moving from concept to functionality.
Customized, Optimized Intel® Atom™ SoCs for New and Existing Market Segments
Manufactured using Intel’s leading 22nm process technology, the new Intel Atom C2000 product family features up to eight cores, a range of 6 to 20Watts TDP, integrated Ethernet and support for up to 64 gigabytes (GB) of memory, eight times the previous generation. OVH* and 1&1, leading global web-hosting services companies, have tested Intel Atom C2000 SoCs and plan to deploy them in its entry-level dedicated hosting services next quarter. The 22 nanometer process technology delivers superior performance and performance per watt.
Intel is delivering 13 specific models with customized features and accelerators that are optimized for particular lightweight workloads such as entry dedicated hosting, distributed memory caching, static web serving and content delivery to ensure greater efficiency. The designs allow Intel to expand into new markets like cold storage and entry-level networking.
For example, the new Intel Atom configurations for entry networking address the specialized needs for securing and routing Internet traffic more efficiently. The product features a set of hardware accelerators called Intel® QuickAssist Technology that improves cryptographic performance. They are ideally suited for routers and security appliances.
By consolidating three communications workloads – application, control and packet processing – on a common platform, providers now have tremendous flexibility. They will be able to meet the changing network demands while adding performance, reducing costs and improving time-to-market.
Ericsson, a world-leading provider of communications technology and services announced that its blade-based switches used in the Ericsson Cloud System, a solution which enables service providers to add cloud capabilities to their existing networks, will sooninclude the Intel Atom C2000 SoC product family.
Microserver-Optimized Switch for Software Defined Networking
Network solutions that manage data traffic across microservers can significantly impact the performance and density of the system. The unique combination of the Intel Ethernet Switch FM5224 silicon and the WindRiver Open Network Software suite will enable the industry’s first 2.5GbE, high-density, low latency, SDN Ethernet switch solutions specifically developed for microservers. The solution enhances system level innovation, and complements the integrated Intel Ethernet controller within the Intel Atom C2000 processor. Together, they can be used to create SDN solutions for the datacenter.
Switches using the new Intel Ethernet Switch FM5224 silicon can connect up to 64 microservers, providing up to 30 percent3 higher node density. They are based on Intel Open Network Platform reference design announced earlier this year.
First Demonstration of Silicon Photonics-Powered Rack
Maximum datacenter efficiency requires innovation at the silicon, system and rack level. Intel’s RSA design helps industry partners to re-architect datacenters for modularity of components (storage, CPU, memory, network) at the rack level. It provides the ability to provision or logically compose resources based on application specific workload requirements. Intel RSA also will allow for the easier replacement and configuration of components when deploying cloud computing, storage and networking resources.
Intel today demonstrated the first operational RSA-based rack equipped with the newly announced Intel Atom C2000 processors, Intel® Xeon® processors, a top-of-rack Intel SDN-enabled switch and Intel Silicon Photonics Technology. As part of the demonstration, Intel also disclosed the new MXC connector and ClearCurve* fiber technology developed by Corning* with requirements from Intel. The fiber connections are specifically designed to work with Intel Silicon Photonics components.
The collaboration underscores the tremendous need for high-speed bandwidth within datacenters. By sending photons over a thin optical fiber instead of electrical signals over a copper cable, the new technologies are capable of transferring massive amounts of data at unprecedented speeds over greater distances. The transfers can be as fast as 1.6 terabits per second4 at lengths up to 300 meters5 throughout the datacenter.
To highlight the growing range of Intel RSA implementations, Microsoft and Intel announced a collaboration to innovate on Microsoft’s next-generation RSA rack design. The goal is to bring even better utilization, economics and flexibility to Microsoft’s datacenters.
The Intel Atom C2000 product family is shipping to customers now with more than 50 designs for microservers, cold storage and networking. The products are expected to be available in the coming months from vendors including Advantech*, Dell*, Ericsson*, HP*, NEC*, Newisys*, Penguin Computing*, Portwell*, Quanta*, Supermicro*, WiWynn*, ZNYX Networks*.
Intel Brings Supercomputing Horsepower to Big Data Analytics [press release, Nov 19, 2013]
- Intel discloses form factors and memory configuration details of the CPU version of the next generation Intel® Xeon Phi™ processor (code named “Knights Landing“), to ease programmability for developers while improving performance.
- Intel® Xeon® processor-based systems power more than 82 percent of all supercomputers on the recently announced 42nd edition of the Top500 list.
- New Intel® HPC Distribution for Apache Hadoop* and Intel® Cloud Edition for Lustre* software tools bring the benefits of Big Data analytics and HPC together.
- Collaboration with HPC community designed to deliver customized products to meet the diverse needs of customers.
SUPERCOMPUTING CONFERENCE, Denver, Nov. 19, 2013 –Intel Corporation unveiled innovations in HPC and announced new software tools that will help propel businesses and researchers to generate greater insights from their data and solve their most vital business and scientific challenges.
“In the last decade, the high-performance computing community has created a vision of a parallel universe where the most vexing problems of society, industry, government and research are solved through modernized applications,” said Raj Hazra, Intel vice president and general manager of the Technical Computing Group. “Intel technology has helped HPC evolve from a technology reserved for an elite few to an essential and broadly available tool for discovery. The solutions we enable for ecosystem partners for the second half of this decade will drive the next level of insight from HPC. Innovations will include scale through standards, performance through application modernization, efficiency through integration and innovation through customized solutions.”
Accelerating Adoption and Innovation
From Intel® Parallel Computing Centers to Intel® Xeon Phi™ coprocessor developer kits, Intel provides a range of technologies and expertise to foster innovation and adoption in the HPC ecosystem. The company is collaborating with partners to take full advantage of technologies available today, as well as create the next generation of highly integrated solutions that are easier to program for and are more energy-efficient. As a part of this collaboration Intel also plans to deliver customized HPC products to meet the diverse needs of customers. This initiative is aimed to extend Intel’s continued value of standards-based scalable platforms to include optimizations that will accelerate the next wave of scientific, industrial, and academic breakthroughs.
During the Supercomputing Conference (SC’13), Intel unveiled how the next generation Intel Xeon Phi product (codenamed “Knights Landing”), available as a host processor, will fit into standard rack architectures and run applications entirely natively instead of requiring data to be offloaded to the coprocessor. This will significantly reduce programming complexity and eliminate “offloading” of the data, thus improving performance and decreasing latencies caused by memory, PCIe and networking.
Knights Landing will also offer developers three memory options to optimize performance. Unlike other Exascale concepts requiring programmers to develop code specific to one machine, new Intel Xeon Phi processors will provide the simplicity and elegance of standard memory programming models.
In addition, Intel and Fujitsu recently announced an initiative that could potentially replace a computer’s electrical wiring with fiber optic links to carry Ethernet or PCI Express traffic over an Intel® Silicon Photonics link. This enables Intel Xeon Phi coprocessors to be installed in an expansion box, separated from host Intel Xeon processors, but function as if they were still located on the motherboard. This allows for much higher density of installed coprocessors and scaling the computer capacity without affecting host server operations.
Several companies are already adopting Intel’s technology. For example, Fovia Medical*, a world leader in volume rendering technology, created high-definition, 3D models to help medical professionals better visualize a patient’s body without invasive surgery. A demonstration from the University of Oklahoma’s Center for Analysis and Prediction of Storms (CAPS) showed a 2D simulation of an F4 tornado, and addressed how a forecaster will be able to experience an immersive 3D simulation and “walk around a storm” to better pinpoint its path. Both applications use Intel® Xeon® technology.
High Performance Computing for Data-Driven Discovery
Data intensive applications including weather forecasting and seismic analysis have been part of the HPC industry from its earliest days, and the performance of today’s systems and parallel software tools have made it possible to create larger and more complex simulations. However, with unstructured data accounting for 80 percent of all data, and growing 15 times faster than other data1, the industry is looking to tap into all of this information to uncover valuable insight.
Intel is addressing this need with the announcement of the Intel® HPC Distribution for Apache Hadoop* software (Intel® HPC Distribution) that combines the Intel® Distribution for Apache Hadoop software with Intel® Enterprise Edition of Lustre* software to deliver an enterprise-grade solution for storing and processing large data sets. This powerful combination allows users to run their MapReduce applications, without change, directly on shared, fast Lustre-powered storage, making it fast, scalable and easy to manage.
The Intel® Cloud Edition for Lustre* software is a scalable, parallel file system that is available through the Amazon Web Services Marketplace* and allows users to pay-as-you go to maximize storage performance and cost effectiveness. The software is ideally suited for dynamic applications, including rapid simulation and prototyping. In the case of urgent or unplanned work that exceeds a user’s on-premise compute or storage performance, the software can be used for cloud bursting HPC workloads to quickly provision the infrastructure needed before moving the work into the cloud.
With numerous vendors announcing pre-configured and validated hardware and software solutions featuring the Intel Enterprise Edition for Lustre, at SC’13, Intel and its ecosystem partners are bringing turnkey solutions to market to make big data processing and storage more broadly available, cost effective and easier to deploy. Partners announcing these appliances include Advanced HPC*, Aeon Computing*, ATIPA*, Boston Ltd.*, Colfax International*, E4 Computer Engineering*, NOVATTE* and System Fabric Works*.
Intel Tops Supercomputing Top 500 List
Intel’s HPC technologies are once again featured throughout the 42nd edition of the Top500 list, demonstrating how the company’s parallel architecture continues to be the standard building block for the world’s most powerful supercomputers. Intel-based systems account for more than 82 percent of all supercomputers on the list and 92 percent of all new additions. Within a year after the introduction of Intel’s first Many Core Architecture product, Intel Xeon Phi coprocessor-based systems already make up 18 percent of the aggregated performance of all Top500 supercomputers. The complete Top500 list is available at www.top500.org.
1 From IDC Digital Universe 2020 (2013)
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Intel does not control or audit the design or implementation of third party benchmark data or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmark data are reported and confirm whether the referenced benchmark data are accurate and reflect performance of systems available for purchase.
Fujitsu Lights up PCI Express with Intel Silicon Photonics [The Data Stack blog of Intel, Nov 5, 2013]
Victor Krutul is the Director of Marketing for the Silicon Photonics Operation at Intel. He shares the vision and passion of Mario Paniccia that Silicon Photonics will one day revolutionize the way we build computers and the way computers talk to each other. His other passions are tennis and motorcycles (but not at the same time)!
I am happy to report that Fujitsu announced at its annual Fujitsu Forum on November 5th 2013, that it has worked with Intel to build and demonstrate the world’s first Intel® Optical PCIe Express (OPCIe) based server. This OPCIe server was enabled by Intel® Silicon Photonics technology. I think Fujitsu has done some good work when they realized that OPCIe powered servers offer several advantages over non OPCIe based servers. Rack based servers, especially 1u and 2u servers are space and power constrained. Sometimes OEMs and end users want to add additional capabilities such as more storage and CPUs to these servers but are limited because there is simply not enough space for these components or because packing too many components too close to each other increases the heat density and prevents the system from being able to cool the components.
Fujitsu found a way to fix these limitations!
The solution to the power and space density problems is to locate the storage and compute components on a remote blade or tray in a way that they appear to the CPU to be on the main motherboard. The other way to do this is to have a pool of hard drives managed by a second server – but this approach requires messages be sent between the two servers and this adds latency – which is bad. It is possible to do this with copper cables; however the distance the copper cables can span is limited due to electro-magnetic interference (EMI). One could use amplifiers and signal conditioners but these obviously add power and cost. Additionally PCI Express cables can be heavy and bulky. I have one of these PCI Express Gen 3 16 lanes cables and it feels like it weighs 20 lbs. Compare this to a MXC cable that carries 10x the bandwidth and weighs one to two pounds depending on length.
Fujitsu took two standard Primergy RX200 servers and added an Intel® Silicon Photonics module into each along with an Intel designed FPGA. The FPGA did the necessary signal conditioning to make PCI Express “optical friendly”. Using Intel® Silicon Photonics they were able to send PCI Express protocol optically through an MXC connector to an expansion box (see picture below). In this expansion box was several solid state disks (SSD) and Xeon Phi co-processors and of course there was a Silicon Photonics module along with the FPGA to make PCI Express optical friendly. The beauty of this approach was that the SSD’s and Xeon Phi’s appeared to the RX200 server as if they were on the mother board. With photons traveling at 186,000 miles per second the extra latency of travelling down a few meters of cable cannot reliably be measured (it can be calculated to be ~5ns/meter or 5 billionths of a second).So what are the benefits of this approach? Basically there are four. First, Fujitsu was able to increase the storage capacity of the server because they now were able to utilize the additional disk drives in the expansion box. The number of drives is determined by the physical size of the box. The 2nd benefit is they were able to increase the effective CPU capacity of the Xeon E5’s in the RX200 server because the Xeon E5’s could now utilize the CPU capacity of the Xeon Phi co-processors. In a standard 1u rack it would be hard if not impossible to incorporate Xeon Phi’s. The third benefit is the cooling. First putting the SSD’s in a expansion box allows one to burn more power because the cooling is divided between the fans in the 1U rack and those in the expansion box, The fourth benefit is what is called cooling density or, how much heat needs to be cooled per cubic centimeter. Let me make up an example. For simplicity sake let’s say the volume of a 1u rack is 1 cubic meter and let’s say there are 3 fans cooling that rack and each fan can cool 333 watts for a total capacity of 1000 watts of cooling. If I evenly space components in the rack each fan does its share and I can cool 1000 watts. Now assume I put all the components so that just one fan is cooling them because there is no room in front of the other two fans. If those components expend more than 330 watts they can’t be cooled. That’s cooling density. The Fujitsu approach solves the SSD expansion problem, the CPU expansion problem and the total cooling and cooling density problems.
Go to:https://www-ssl.intel.com/content/dam/www/public/us/en/images/research/pci-express-and-mxc-2.jpg if you want to see the PCI Express copper cable vs the MXC optical cable (you will also see we had a little fun with the whole optical vs copper thing.)
Besides Intel® Silicon Photonics the Fujitsu demo also included Xeon E5 microprocessors and Xeon Phi co-processors.
Why does Intel want to put lasers in and around computers?
Photonic signaling (aka fiber optics) has 2 fundamental advantages over copper signaling. First, when electric signals go down a wire or PCB trace they emit electromagnetic radiation (EMI) and when this EMI from one wire or trace couples into an adjacent wire it causes noise, which limits the bandwidth distance product. For example, 10G Ethernet copper cables have a practical limit of 10 meters. Yes, you can put amplifies or signal conditioners on the cables and make an “active copper cable” but these add power and cost. Active copper cables are made for 10G Ethernet and they have a practical limit of 20 meters.
Photons don’t emit EMI like electrons do thus fiber based cables can go much longer. For example with the lower cost lasers used in data centers today at 10G you can build 500 meter cables. You can go as far as 80km if you used a more expensive laser, but these are only needed a fraction of the time in the data center (usually when you are connecting the data center to the outside world.)
The other benefit of optical communication is lighter cables. Optical fibers are thin, typically 120 microns and light. I have heard of situations where large data centers had to reinforce the raised floors because with all the copper cable, the floor loading limits would be exceeded.
So how come optical communications is not used more in the data center today? The answer is cost!
Optical devices made for data centers are expensive. They are made out of expensive and exotic materials like Lithium-Niobate or Gallium-Arsenide. Difficult to pronounce, even more difficult to manufacture. The state of the art for these exotic materials is 3 inch wafers with very low yields. Manufacturing these optical devices is expensive. They are designed inside of gold lined cans and sometimes manual assembly is required as technicians “light up” the lasers and align them to the thin fibers. A special index matching epoxy is used that sometimes can cost as much as gold per ounce. Bottom line is that while optical communications can go further and uses light fiber cables it costs a lot more.
Enter Silicon Photonics! Silicon Photonics is the science of making Photonic devices out of Silicon in a CMOS fab. Also known as optical but we use the word photonics because the word “optical” is also used when describing eye glasses or telescopes. Silicon is the most common element in the Earth’s crust, so it’s not expensive. Intel has 40+ years of CMOS manufacturing experience and has worked over the 40 years to drive costs down and manufacturing speed up. In fact, Intel currently has over $65 Billion of capital investment in CMOS fabs around the world. In short, the vision of Intel® Silicon Photonics is to combine the natural advantages of optical communications with the low cost advantages of making devices out of Silicon in a CMOS fab.
Intel has been working on Intel® Silicon Photonics (SiPh) for over ten years and has begun the process of productizing SiPh. Earlier this year, at the OCP summit Intel announced that we have begun the long process of building up our manufacturing abilities for Silicon Photonics. We also announced we had sampled customers with early parts.
People will often ask me when we will ship our products and how much they will cost? They also ask me for all sort of technical details about out SiPh modules. I tell them that Intel is focusing on a full line of solutions – not a single component technology. What our customers want are complete Silicon Photonic based solutions that will make computing easier, faster or less costly. Let me cite our record of delivering end-to-end solutions:
Summary of Intel Solution Announcements
January 2013: We did a joint announcement with Facebook at the Open Compute Project (OCP) meeting that we worked together to design disaggregated rack architecture (since renamed RSA [Rack Scale Architecture]). This architecture used Intel® Silicon Photonics and allowed for the storage and networking to be disaggregated or moved away from the CPU mother board. The benefit is that users can now choose which components they want to upgrade and are not forced to upgrade everything at the same time.
April 2013: At the Intel Developer Forum we demonstrated the first ever public demonstration of Intel® Silicon Photonics at 100G.
September 2013: We demonstrated a live working Rack Scale Architecture solution using Intel® Silicon Photonics links carrying Ethernet protocol.
September 2013: Joint announcement with Corning for new MXC and ClearCurve fiber solution capable of transmission of 300m with Intel® Silicon Photonics at 25G. This reinforced our strategy of delivering a complete solution including cables and connectors that are optimized for Intel® Silicon Photonics.
September 2013: Updated Demonstration of a solution using Silicon Photonics to send data at 25G for more than 800 meters over multimode fibers – A new world record.
Today: Intel has extended its Silicon Photonics solution leadership with a joint announcement with Fujitsu demonstrating the world’s first Intel® Silicon Photonics link carrying PCI Express protocol.
I hope you will agree with me that Intel is focusing on more than just CPUs or optical modules and will deliver a complete Silicon Photonics solution!