Wednesday, 11 February 2009

ServerWorks pivotal in Intel-Rambus chipset tale

ntel and the biggest PC vendors in the world are relying on ServerWorks, a maker of chipsets, to help sell servers during 2001, so making the mysterious firm a key factor in the memory politics currently plaguing the industry.

A little while back, The Register published information about server roadmaps which showed that Intel is relying heavily on ServerWorks technology for its high end microprocessors next year.

ServerWorks, which was formerly known as Reliance, is a privately held company which, we understand, will float (IPO) on one of the US markets in the not too distant future.

You can see how it has quietly been signing deals with some major PC folk by turning to this page here.

Intel, Compaq, IBM, Dell, Acer, SuperMicro, Hewlett Packard, Fujitsu Siemens have all struck deals with ServerWorks, the most recent being the Big Blue deal. ServerWorks issued a press release on June 21st, which you can find on its Web site here. This shows how ServerWorks and Big Blue are collaborating to intro S/390, RS, and AS/400 technology to give its Netfinity (IBM server platform) and its "X-rated" architecture a push.

Cos the fact is, that in the multiprocessing sector about two CPUs, Rambus technology does not cut it. This is partly due to cost -- the amount of megabytes you stuff into servers would prohibit RIMM solutions.

ServerWorks is enabling double data rate (DDR) memory in 184 pin configurations for the server market and so far there is little sign that the company has signed on the dotted line with Rambus to license this technology. Information on ServerWorks plans for DDR is scant -- a mere few lines show that's what it's doing, but how it's doing it is a different question altogether.

Now we shall have to just wait and see whether the Rambus move to shelter DDR memory suppliers under its wing will include ServerWorks, a key partner to Intel et al, or whether Intel, aided and abetted by various Dramurai and customers, will seek to protect its lucrative server end of the market by using other ways and means.

Meanwhile, for US lawyers who read The Reg, here is Rambus' revised SEC filing for its deal with Intel.

We can't make head or tail of it, being laymen, but if any of you can, please let us know. The ServerWorks saga is going to run and run.. ®

Intel, Rambus try to control damage


Intel delays chipset as Rambus falls more
update As expected, Intel delays the new 820 chipset, and the chipmaker says it is working with PC manufacturers to quickly resolve the problem.

Via gaining clients in wake of Intel-Rambus delay
IBM will announce tomorrow that it is using chipsets from Via for three new systems in the wake of the delay of Intel's 820 chipset.

previous coverage
Intel to delay new chipset as Rambus reels
update Intel cancels a planned Monday unveiling of a new chipset that would have enabled the first use of next-generation Rambus memory in PCs, sources say.

Costly new Rambus problem stings PC makers
A major problem involving Rambus memory technology could delay for months computers that were scheduled to debut Monday--the problem could force PC makers to throw away critical parts of new high-end computers or face the prospect of shipping potentially faulty machines.

Intel's New Pentium® III Processors Bring Top Performance And Optimal Battery Life To Mobile PCs

SANTA CLARA, Calif., Sept. 25, 2000 - Intel Corporation today introduced new mobile Pentium® III processors with Intel® SpeedStep™ technology that bring higher performance and optimum battery life to mobile PCs. The world's leading PC manufacturers are introducing full size as well as "thin and light" notebooks based on the new Intel processors, delivering as much as five to six hours of battery life depending on system configuration.

Intel SpeedStep technology is the industry's first dynamic frequency and voltage scaling technology, automatically detecting whether the user is on AC power or battery power to deliver the optimum balance between performance and battery life. The new mobile Pentium III processor 850 MHz featuring Intel SpeedStep technology runs at 1.65 volts in Maximum Performance Mode and automatically drops to 1.35 volts and 700 MHz in Battery Optimized Mode. The mobile Pentium III processor 800 MHz with Intel SpeedStep technology runs at 1.65 volts in Maximum Performance Mode and automatically drops to 1.35 volts and 650 MHz in Battery Optimized Mode. Both consume less than two watts of power to enable longer battery life.

The company also introduced a mobile Intel® Celeron™ processor at 700 MHz, the top-performing processor for value mobile PCs. All three processors are available immediately.

"The world's leading PC manufacturers are using Intel mobile technology to provide users with the best combination of high performance and battery life for today's advanced PC and Internet software," said Don MacDonald, marketing director at Intel's Mobile Platforms Group. "Intel SpeedStep technology is helping PC makers deliver near desktop-equivalent performance in smaller, lighter mobile PCs that run longer."

The new processors also take advantage of Intel's QuickStart technology, which automatically places the processor in a power-saving mode of below one-half watt when full performance is not required -- such as between key strokes -- and instantly returns to full performance when needed.

Product Maximum Performance Mode Battery Optimized Mode Pricing(qty 1,000) Average
Power**
Operating
Voltage
Mobile Intel® Pentium® III Processor featuring Intel SpeedStep™ technology 850 MHz 850 MHz 700 MHz $722 <2> 1.35 volts in Battery Optimized Mode
Mobile Intel® Pentium® III Processor featuring Intel SpeedStep™ technology 800 MHz 800 MHz 650MHz $508 <2> 1.35 volts in Battery Optimized Mode

Product Speed Pricing(qty 1,000) Average
Power**
Operating
Voltage
Mobile Intel® Celeron™ Processor 700 MHz 700MHz $181 <3> 1.6 volts

Intel, the world's largest chip maker, is also a leading manufacturer of computer, networking and communications products. Additional information about Intel is available at www.intel.com/pressroom.

Round-up: Dual-core servers

Dual-core processors deliver many benefits, including much-improved performance per watt, over single-core designs. We examine three servers from the leading vendors to see what this technology can do for your business.

dual-core serversMulti-core technology is having a big impact in the datacentre, with dual-core servers, in particular, now commonplace. Organisations are reaping a number of significant benefits, performance foremost among them. As the name implies, you get two processors on each dual-core chip, effectively doubling the amount of processing the host server can cope with; there are four cores on the latest quad-core chips. Add in other recent technological advances such as bigger cache sizes, plus faster memory and bus speeds, and the latest multi-cores can provide truly staggering increases in computing power. This power can be used to full advantage for consolidation and virtualisation projects, as well as for high-performance clustering and grid computing applications.

A less publicised, but equally important, benefit of multi-core technology is the reduced amount of electricity that processors need, with vendors vying to deliver the best performance-per-watt ratio. That, in turn means lower operating costs -- not to mention helping to save the planet. These benefits are further amplified by simply being able to do more with less. For example, just by upgrading to dual-core you could halve your current number of servers, and still do the same amount of work -- if not more.

In addition to new processors, a lot more has been happening in the server market in terms of storage, networking and manageability, making it very difficult to test and compare like for like. All the more so given that servers come in a range of shapes and sizes -- from free-standing towers, through rack-mount devices to highly space-efficient blade servers.

The aim of this group test, therefore, is to not try and attempt a direct comparison, but to provide a snapshot of the breadth of the most popular industry-standard dual-core servers currently available. More specifically, we've gathered together three two-way dual-core servers designed to be used by enterprises for general file sharing, as front-end Web servers and more specialised hosting duties.

We've chosen one specific model each from market leaders Dell, HP and IBM, although it's worth pointing out that the configurations are far from unique and you'll find similar products from all three vendors, and others too. All support both Intel and AMD processors and all offer a range of freestanding, rack-mount and blade implementations.

Bear that in mind, and hopefully this round-up will provide a good insight as to what's currently available and will help you investigate further the benefits that dual core and, in due course, quad core can provide to your business.

Dell PowerEdge 1950

With computing power to burn, the PowerEdge 1950 is ideal where high performance is required, such as clustering and Web front-end duties. However, the ramped format does make life more difficult when it comes to database hosting and other backend deployments.

Dell PowerEdge 1950A two-way SMP server designed to accommodate Intel dual-core Xeon processors, the PowerEdge 1950 sits at the top of Dell's 1U rack-mount range. It's not a particularly expandable solution -- there simply isn't room for lots of adaptors or disks -- but that hasn't stopped Dell's designers cramming a lot in to create a server that can be used for a variety of purposes.

The PE 1950 is very solid and well built. No special tools are needed to install or service it, and the whole of the top lifts off for access. A sliding rail kit can be supplied as an optional extra and there's a lockable front bezel to prevent unauthorised tampering and stop the server being switched off accidentally. You can also specify a second, redundant, power supply if required.

The Intel motherboard takes up only a fraction of the space inside the chassis, with two prominent sockets for the 64-bit Xeon processors. The review system came with Woodcrest chips fitted (now referred to as the Xeon 5000 and 5100 series), which are both faster and more energy-efficient than earlier Intel dual-core designs. However, the amount of power you'll have on tap will depend on the processors chosen, as will the price you'll have to pay.

Dual-core prices continue to fall as new designs are introduced and quad-core products are released. Our review sample, for example, had a pair of mid-range Xeon 5140 chips, clocked at 2.33GHz with a 1,333MHz front side bus (FSB). You can also specify the much faster (3GHz) 5160 chips. At the other end of the scale are the Xeon 5050 processors, also clocked at 3GHz but with a 667MHz FSB. Dell has also recently added quad-core Xeons as an option.

Of course you could start with one processor and add another later as needed, but with such a huge range of options and prices it's worth getting some expert advice. A low-cost configuration with one processor, for example, will probably be more than adequate for basic file and print sharing, but processor performance can have a big impact when it comes to clustering and application hosting. It's also worth bearing in mind that processors need to be matched, and if you don't order what you want up front you could encounter difficulties when upgrading later on. This perhaps explains why very few two-way purchases are ever beefed up with a second processor.

Memory can also have a big effect, both on your wallet and what you can do with the server. There are eight DIMM sockets on the PE 1950, which can accommodate up to 32GB of DDR2, fully buffered, DRAM with optional memory sparing and mirroring capabilities for those looking for maximum reliability. You can start with as little as 256MB, but ours had a more reasonable 4GB -- more than enough for file sharing and a decent amount if you're hosting an intranet server or a small company e-mail system.

There are yet more options when it comes to storage, starting with a choice between standard 3.5-inch internal hard disks or small 2.5-inch notebook-format drives. With 3.5-inch drives the limit is just two, using either Serial ATA (SATA) or Serial Attached SCSI (SAS) connectivity. The SATA disks can hold up to 750GB each, while the biggest SAS drive is limited to 300GB. If you opt for smaller 2.5-inch drives, SAS is your only choice, at a mere 73GB per disk; however, you can cram four 2.5-inch disks into the case, as on our review system.

Our review server also came with a basic integrated RAID controller, although more advanced plug-in RAID adaptors are optionally available. You can also specify a TCP offload engine (TOE) to be enabled as an option on the integrated Gigabit Ethernet network interface, which would be valuable when connecting the server to an iSCSI SAN.

Further expansion is via plug-in adaptors: riser cards provide either two x8 lane PCI-Express slots or a pair of 64-bit 133MHz PCI-X connectors.

On the software front, Dell will factory-install Windows Server 2003, Red Hat Enterprise Linux 4 ES or SUSE Linux Enterprise Server 10. Finally, you get the usual integrated remote management controller plus additional out-of-band management options.

Benchmarks: Intel Core i7 (Nehalem)

Benchmarks: Intel Core i7 (Nehalem)

Intel's new Nehalem architecture features an integrated memory controller and runs two threads per CPU core. Our extensive benchmark tests reveal how well the new quad-core processors perform in practice.

Five years after AMD, Intel has produced its first CPU with an integrated memory controller. The AMD design was ahead of the game in a number of areas, and market leader Intel has integrated ideas from its competitor into the new Nehalem architecture. Until now, Intel has manufactured its quad-core processors from two dual-core dies. AMD always maintained that there was only one company that could build real quad cores — a distinction that Intel pooh-poohed. Now even that distinction has been lost: Nehalem (Core i7) CPUs consist of a single chip.

But that's not the end of the story. AMD processors communicate between themselves and with peripherals using AMD's Hypertransport, a point-to-point switched interconnect that maintains high bandwidth through ad-hoc independent channels. That technology contrasts with Intel's approach of having chips use the frontside bus to address not only memory but also to connect to other system components, sharing that channel between devices. That's no real disadvantage with single-core systems, and Intel has maintained performance in dual-core and quad-core systems by using large amounts of cache.

However, this old-fashioned way of communicating is a bottleneck for servers with multiple sockets. In the long term, even the 64MB on-chip cache with snoop filtering that Intel offers in its Xeon 7300 chipset or the 16MB Level 3 cache recently introduced into the six-core Dunnington could not help the chip giant remain competitive with AMD in the server field.

Intel's answer is to provide the Nehalem architecture with a technology called Quick Path Interconnect (QPI) that is comparable with Hypertransport. QPI is in the Nehalem desktop variants, codenamed Bloomfield, that are available later this month. The server variant, Gainestown, for two-socket systems is to follow in the first quarter of 2009, according to Intel boss Paul Otellini. Intel plans on introducing Nehalem chips for multi-processor systems in the second half of 2009, and QPI will also be part of Tukwila, the next generation Itanium processor, due at the end of this year.

Nehalem features, test setup & power consumption

Intel has also cribbed a few virtualisation ideas from AMD for the Nehalem architecture. With the introduction of the Barcelona processor, AMD offered Rapid Virtualisation Indexing (RVI) to allow virtual machines direct memory access. Virtualisation specialist VMware enthusiastically backed the AMD technology. The equivalent technology in Intel's Nehalem is called Extended Page Table (EPT).

On top of the ideas borrowed from AMD, Nehalem chips offer a number of additional features. For example, the four processor cores can work on two threads at the same time, a refinement of the P4's well-known Hyperthreading architecture. As well as the four physical arithmetic and logic units, a further four logic units are also available.

Unlike the AMD equivalent chips, which only support dual-channel DDR2/1066 memory, the Core i7 processors, officially available from 17 November, offer three DDR3/1066 channels. Thus the chips have a theoretical memory bandwidth of 25.5GB/s, compared with the AMD chips' maximum of 16GB/s. Individual Nehalem processors are differentiated by the speed of the QPI interface. On the top model — the Core i7 Extreme 965 — QPI runs at 3.2GHz, but only reaches 2.4GHz on the smaller models.

Memory
According to Intel, the new Nehalem processors are specified up to a memory speed of DDR3/1066, while the current Core 2 architecture can be operated with DDR3/1600 memory. But according to the benchmark tool Everest 4.60, the internal memory controller supports up to 1333MHz. It could be that the system would not work stably in all situations at that frequency, so Intel opted for the more conservative specification. For optimal performance no more than three memory modules should be used. If four DIMMs are used, memory performance falls because the important memory parameter Command Rate can only handle two wait states.

Nehalem processors offer a built-in overclocking feature called Turbo Mode. If a piece of software fails to make full demands on all the cores, the chip's internal logic ensures that calculations in the cores that are in use operate at a higher clock speed. Last but not least, the Nehalem processors come equipped with SSE4.2, a command set extension that might be particularly useful for accelerating processing of string variables in search engines. Programs such as browsers, email clients and text processing programs could also benefit from the faster processing offered by SSE4.2.

Power consumption
In terms of power consumption, the system with the Nehalem Core i7 965 Extreme processor core ranks about the same as Intel's previous best-performing chip, the Core 2 Extreme QX9775, although the Nehalem processor, with 731 million transistors, clearly has fewer electronic circuits than the QX9775 with 820 million. Because hyperthreading technology makes more intensive use of the arithmetic units than with the single threading cores, they take the same power overall as the more complex earlier designs despite having fewer transistors.

Power consumption (Watts): shorter bars are better.

HP ProLiant ML370 G5

HP ProLiant ML370 G5

It can handle a wide range of backend business applications, but HP's Proliant ML370 G5 is over-specified for organisations with more modest requirements.

HP Proliant ML370 G5The first thing you notice about the HP ProLiant ML370 G5 is its size. It's massive, with room for not just the latest dual-core Xeon processors but enough memory, storage and other options to suit a wide range of applications. Because it's very solidly built, the huge desk-side tower housing the ML370 is extremely heavy, requiring two people to lift it. It can also be rack mounted if required, although it ends up 5U high and the rack would need to be well anchored to prevent it tipping -- especially if you choose any of the optional extras. Our review model, for example, came with a redundant second power supply and an additional bank of hot-swap fans. You can also add a lot of storage, making for a very heavy and, at times, quite noisy system.

Still, all that bulk means plenty of space to configure the server to your exact requirements, starting with processors. As with the Dell PowerEdge1950, HP now supports the latest Intel dual-core Xeon "Woodcrest" chips, and our review system came with a single Xeon 5140 clocked at 2.33MHz with a 1333MHz frontside bus (FSB). However, you can choose from a variety of Xeon 5000 and 5100-series chips and fit up to two on the Intel motherboard to suit a wide range of applications.

Quad-core chips are also available for the ML370 G5. Note, though, that as on the Dell PowerEdge 1950, the faster processors can push the price up significantly so it's worth making sure that any performance gains will actually be exploited and that the rest of the configuration is up to the job.

Similar comments apply when it comes to memory: specifying more than you need will be costly and pointless unless you expect demand to grow in the future. Our review server came with 2GB, which is a good starting point; fully buffered DDR2 DRAM is used throughout, and there's support for ECC, online sparing and memory mirroring for maximum availability. You can also specify an optional second memory board, taking the maximum RAM capacity up to 64GB. This is double the amount that the 1U PowerEdge 1950 can handle, and is great for data-intensive database servers.

The massive tower chassis provides plenty of room for storage. There are eight 2.5-inch hot-swap drive bays on the review machine, but these only take up half of the available space set. Another set can also be configured with Serial Attached SCSI (SAS), which is the preferred technology here, HP having recently announced its intention to standardise on 2.5-inch SAS disks across the ProLiant family. An integrated RAID controller comes as standard, while a variety of others can be specified to fit into the two PCI-X and six free PCI Express expansion slots, plus a huge number of external storage options.

For this review, HP provided a pair of 36GB 10,000rpm drives, but 72GB and 146GB disks are also available, giving a maximum internal capacity of over 2TB, depending on the level of RAID protection configured. SATA disks can also be specified, although 60GB drives are the only option in this case. However, on this kind of server, most buyers will opt for SAS.

A Gigabit Ethernet server adaptor with built-in TCP offload engine is integrated onto the motherboard, and this could be used for connecting to an iSCSI SAN. The ML370 G5 server also gets a new integrated Lights-Out (iLO) remote management processor, adding virtual KVM and power management facilities that enable the ProLiant server to be controlled remotely via a Web browser. Remote management, of course, is also possible using a variety of tools, some of which are included as standard, with others being optional extras.

Finally, the HP hardware is fully certified for all the leading Windows, Linux and Unix operating system implementations. These can also be preinstalled along with selected applications configured to customer specifications, although the range and cost of such services is likely to differ depending on the reseller or system integrator involved.

We were very impressed with what the new ML370 G5 has to offer. It's perhaps a little over the top for basic file and print sharing, but as a database server or as an ERP platform in a larger company it's got everything you might need, and then some

Intel opens multicore threads library

Intel has released a multicore development library under the "GPLv2 with runtime exception" license. Intel Threading Building Blocks is a cross-platform, portable library aimed at improving the performance of C++ applications on multicore processors, without requiring deep developer understanding of parallel programming theory.

Intel's Threading Building Blocks (TBB) comprises a small runtime of about 120KB, together with "template libraries" linked in at compile time. The libraries implement a task scheduler, memory allocator, and timing counter, along with various generic parallel algorithms, thread-safe containers, and synchronization primitives, according to the company.

Intel says TBB provides an abstraction for parallelism that "avoids the low-level programming inherent in the direct use of threading packages such as p-threads or Windows threads." James Reinders, chief evangelist for Intel's software development products team, said that in practice, "TBB tends to do better, compared to hand-written code, unless it was hand-written by an expert in parallel programming who spends a fair amount of time writing it."

Reinders said that by releasing TBB under an open source license, Intel hopes to see the technology ported to additional architectures and operating systems. Currently, TBB is supported on about a dozen commercial Linux OSes, along with Apple Mac OS and Microsoft Windows. It supports the GNU Compiler Collection (GCC), and Intel's commercial compilers, but was designed to support "any compiler," Intel says. Hardware-wise, it supports currently shipping multi-core processors from Intel.


Intel Threading Building Blocks supported platforms
(Click to enlarge)

Reinders said Intel chose the GPLv2 with runtime exception because the GNU C++ libraries use the same license. Similar to the LGPL, the runtime exception prevents TBB's runtime from "infecting" applications with GPL licensing obligations. "We wanted to go with a proven, accepted license," he said.

Reinders said TBB can help developers adapt existing applications to perform well on multi-core processors. They simply choose parts of their C++ applications that could benefit most from parallelism -- searching and indexing functions, for instance -- and then re-write them around the function calls available in TBB. "In most cases, TBB's memory allocator is a drop-in replacement for malloc," he said.

TBB has been successfully used in a wide range of applications, according to Intel. It scales up to about 16 processors, and targets applications such as digital content creation, animation, financial services, electronic design and automation, and design simulation.

Availability

The open source-licensed version of Intel Threading Building Blocks is available now, here. TBB 2.0 will also continue to be available under a commercial license, priced at $300 and including a year of technical support, upgrades, and new releases. The commercial version is also included with the recently-launched Intel C++ Compiler Professional Editions 10.0.

More with Multi-core: Optimizing Intel Multi-core Embedded Platforms

Until recently, the vast majority of embedded systems employed uniprocessor designs. The situation has changed due to the availability of an entire family of Intel® multi-core processors that offer greater processing capacity while significantly reducing overall power consumption. Solutions exist for a wide range of applications, from small battery-powered devices to large scale Internet routers.

Getting your software up and running is, in many cases, fairly straightforward. The real opportunity lies in getting the software to make full use of all the processor's cores and associated hardware-accelerated Intel technologies.

This webinar will discuss the latest embedded Intel multi-core processing platforms available today and provide a view into the upcoming offerings. A review of the various multi-processing models (e.g. AMP, SMP and BMP) is given plus a discussion on how each model affects code migration, parallelism, debugging and shared resources. It will then explore the tools available to help developers debug and optimize their multi-core applications and conclude with a worked example demonstrating how the tools can be used to maximize the performance on an Intel quad-core system.

Estimated length:
1 hour, including Q & A

Who should attend:
This one-hour seminar with a short Q&A will be of great interest to embedded software development managers, architects, and developers who are considering using a multi-core processor for an upcoming project.

Prerequisites:
There are no prerequisites for this session.

Presenters:
Bill Graham has over 18 years of experience in the software industry, including embedded and real-time systems development, UML modeling, and object-oriented design. At QNX Software Systems, Bill is responsible for product marketing for core QNX products: QNX Neutrino RTOS QNX Momentics Tool Suite. Prior to QNX, Bill has held product management and marketing positions at IBM, Rational, Klocwork, and ObjecTime. Bill holds a Bachelor's and Master's Degree in Electrical Engineering from Carleton University in Ottawa, Canada.

Edwin Verplanke is a Platform Solution Architect at Intel Corporation. Edwin holds a Bachelor's and Master's degree in Computer Science and Electrical Engineering, respectively. For the past 12 years Edwin has focused on communications board design, participated in various standards development covering high-speed interconnects, and more recently, researching multi-core architectures for the embedded market.



Please contact TechOnline's Webinar Support with any questions.
Email: webinar@techonline.com




QNX Software Systems, a Harman International company (NYSE: HAR), is the leading provider of innovative embedded technologies including middleware, development tools, and operating systems. Corporations such as Cisco, Daimler, General Electric, Lockheed Martin, and Logitech depend on QNX technology for a wide range of mission-critical applications. Founded in 1980, QNX Software Systems is headquartered in Ottawa, Canada, and distributes products in over 100 countries worldwide. QNX Privacy Policy.

Intel Corporation By advancing silicon technologies and driving industry standards, Intel is leading the convergence of computing and communications to provide whole new ways for people to gain value from technology and transform their world. Intel is meeting the expanding need for innovative, cost-effective and standards-based building blocks in wired and wireless networking and communications infrastructure. Intel's strength in silicon design, integration and high-volume manufacturing deliver high-performance, low-power components at lower costs that provide the flexibility and faster time-to-market necessary in today's communications industry.

About the Intel Communications Alliance
The Intel Communications Alliance is a community of communications and embedded developers and solutions providers committed to the development of modular, standards-based solutions on Intel technologies.

Intel confirms programmable, multi-core chip

IDF Intel claims to have ended the GPGPU era before it even started with the revelation of a new multi-core processor design called Larrabee. At the Intel Developer Forum today, Intel server chip chief Pat Gelsinger confirmed the long rumored processor.

He described Larrabee as a multi-core, programmable part that will use a tweaked version of the x86 instruction set. Intel expects software developers to craft specialized applications for the processor, giving them a boost on some of the most demanding workloads.

"It will be many cores," Gelsinger said. "You can expect that different versions of the processor will have different numbers of cores."

Gelsinger hesitated to elaborate more on the product other than to add that it will reach at least one teraflop.

The part appears to be an offshoot of Intel's terascale processor labs project. The company today demonstrated a non-x86, 80-core chip reaching 2 teraflops, while consuming 191 watts of power. The same chip hit one teraflop at 46 watts and 1.5 teraflops at 93 watts.

Larrabee looks set to compete against so-called GPGPUs or general purpose graphics processors. AMD has been touting the GPGPU concept as a way for a broader set of software developers to take advantage of the strong performance demonstrated by graphics chips from Nvidia and ATI (now part of AMD).

Gelsinger, however, argued that few coders know how to craft multi-threaded, parallel code that can take advantage of the GPUs. Using the x86 architecture with Larrabee helps ease the software burden, since so many developers are familiar with the technology.

"We don't think there is any such thing as a general purpose GPU," Gelsinger said with bravado.

Intel expects to demonstrate a Larrabee chip, likely with tens of cores, next year.

The company has been busy recruiting top engineers from Nvidia and elsewhere over the past few months to fuel the Larrabee effort. A number of university researchers have also been pushing software for this type of technology.

It's expected that customers in the high performance computing field will realize the most benefit from these programmable chips. They'll be able to craft very specific applications to make use of the multi-core design and should see performance gains well beyond what a standard general purpose chip such as Xeon could offer.

Intel is also working to advance similar types of accelerators that will connect to systems via PCI Express. In addition, it's hyping FPGA co-processors that slot into Xeon sockets. ®

Intel readies massive multicore processors

Intel readies massive multicore processors
Related Stories

Intel shows off 80-core processor

February 11, 2007

Intel pledges 80 cores in five years

September 26, 2006

Intel expands core concept for chips

December 17, 2004
Ants and beetles have exoskeletons--and chips with 60 and 80 cores are going to need them as well.

Researchers at Intel are working on ways to mask the intricate functionality of massive multicore chips to make it easier for computer makers and software developers to adapt to them, said Jerry Bautista, co-director of Intel's Tera-scale Computing Research Program.

These multicore chips, he added, will also likely contain both x86 processing cores, similar to the brains inside the vast majority of Intel's server and PC chips today, as well as other types of cores. A 64-core chip, for instance, might contain 42 x86 cores, 18 accelerators and four embedded graphics cores.

Some labs and companies such as ClearSpeed Technology, Azul Systems and Riken have developed chips with large numbers of cores--ClearSpeed has one with 96 cores--but the cores are capable of performing certain types of operations.

The 80-core mystery


Ever since Intel showed off its 80-core prototype processor, people have asked, "Why 80 cores?"

There's actually nothing magical about the number, Bautista and others have said. Intel wanted to make a chip that could perform 1 trillion floating-point operations per second, known as a teraflop. Eighty cores did the trick. The chip does not contain x86 cores, the kind of cores inside Intel's PC chips, but cores optimized for floating point (or decimal) math.

Other sources at Intel pointed out that 80 cores also allowed the company to maximize the room inside the reticle, the mask used to direct light from a lithography machine to a photo-resistant silicon wafer. Light shining through the reticle creates a pattern on the wafer, and the pattern then serves as a blueprint for the circuits of a chip. More cores, and Intel would have needed a larger reticle.

Last year, Intel showed off a prototype chip with 80 computing cores. While the semiconductor world took note of the achievement, the practical questions immediately arose: Will the company come out with a multicore chip with x86 cores? (The prototype doesn't have them.) Will these chips run existing software and operating systems? How do you solve data traffic, heat and latency problems?

Intel's answer essentially is, yes, and we're working on it.

One idea, proposed in a paper released this month at the Programming Language Design and Implementation Conference in San Diego, involves cloaking all of the cores in a heterogeneous multicore chip in a metaphorical exoskeleton so that all of the cores look like a series of conventional x86 cores, or even just one big core.

"It will look like a pool of resources that the run time will use as it sees fit," Bautista said. "It is for ease of programming."

A paper at the International Symposium on Computer Architecture, also in San Diego, details a hardware scheduler that will split up computing jobs among various cores on a chip. With the scheduler, certain computing tasks can be completed in less time, Bautista noted. It also can prevent the emergence of "hot spots"--if a single processor core starts to get warm because it's been performing nonstop, the scheduler can shift computing jobs to a neighbor.

Intel is also tinkering with ways to let multicore chips share caches, pools of memory embedded in processors for rapid data access. Cores on many dual- and quad-core chips on the market today share caches, but it's a somewhat manageable problem.

"When you get to eight and 16 cores, it can get pretty complicated," Bautista said.

The technology would prioritize operations. Early indications show that improved cache management could improve overall chip performance by 10 percent to 20 percent, according to Intel.

Like the look and feel of technology for heterogeneous chips, programmers won't, ideally, have to understand or deliberately accommodate the cache-sharing or hardware-scheduling technologies. These operations will largely be handled by the chip itself and be obscured from view.

Intel's 80-core chips

Heat is another issue that will need to be contained. Right now, I/O (input-output) systems need about 10 watts of power to shuttle data at 1 terabit per second. An Intel lab has developed a low-power I/O system that can transfer 5 gigabits per second at 14 milliwatts--which is less than 14 percent of the power used by current 5Gbps systems today--and 15Gbps at 75 milliwatts, according to Intel. A paper outlining the issue was released at the VLSI Circuits Symposium in Japan this month.

Low-power I/O systems will be needed for core-to-core communication as well as chip-to-chip contacts.

"Without better power efficiency, this just won't happen," said Randy Mooney, an Intel fellow and director of I/O research.

Intel executives have said they would like to see massive multicore chips coming out in about five years. But a lot of work remains. Right now, for instance, Intel doesn't even have a massive multicore chip based around x86 cores, a company spokeswoman said.

The massive multicore chips from the company will likely rely on technology called Through Silicon Vias (TSVs), other executives have said. TSVs connect external memory chips to processors through thousands of microscopic wires rather than one large connection on the side. This increases bandwidth.

The Multi-Core Dilemma - By Patrick Leonard

By Steve Pitzel (Intel) (16 posts) on March 14, 2007 at 8:27 pm

Guest Blogger Bio: Patrick Leonard

Hardware Evolution

Throughout the history of modern computing, enterprise application developers have been able to rely on new hardware to deliver significant performance improvements while actually reducing costs at the same time. Unfortunately, increasing difficulty with heat and power consumption along with the limits imposed by quantum physics has made this progression increasingly less feasible.

There is good news. Hardware vendors recognized this several years ago, and have introduced multi-core hardware architectures as a strategy for continuing to increase computing power without having to make ever smaller circuits.

Sounds Good, So What's the Dilemma?

The "dilemma" is this: a large percentage of mission-critical enterprise applications will not "automagically" run faster on multi-core servers. In fact, many will actually run slower.

There are two main reasons for this:

  1. The clock speed for each "core" in the processor is slower than previous generations. This is done primarily to manage power consumption and heat dissipation. For example, a processor with a single core from a few years ago that ran at 3.0 Ghz is being replaced with a dual or quad-core processor with each core running in the neighborhood of 2.6 Ghz. More total processing power, but each one is a bit slower.
  1. Most enterprise applications are not programmed to be multi-threaded. A single-threaded application cannot take advantage of the additional cores in the multi-core processor without sacrificing ordered processing. The result is idle processing time on the additional cores. Multi-threaded software should do better, but many people are finding that their multi-threaded code behaves differently in a multi-core environment than it did on a single core, so even these should be tested.

Won't my application server or operating system take care of this for me?

One of the key considerations here is the order of processing. A single threaded application that needs to ensure that A happens before B cannot run multiple instances concurrently on multiple cores and still ensure a particular order.

Application servers and operating systems are generally multi-threaded themselves, but unfortunately their multi-threaded nature does not necessarily extend to the applications that run on them. The app server and OS don't know what the proper order is for your particular business logic unless you write code to tell them. In fact, they are designed to simply process any thread as soon as possible, potentially disastrous in a business application. SMP (symmetric multiprocessing) has similar limitations.

So we are back to the same problem -- how to run multiple instances concurrently on multiple cores and still ensure a particular order.

Intel's 45nm announcement

Intel recently announced that they will have chips in the near future with 45nm features, a significant advance from the 60-65nm that is prevalent today. The company has also made it clear that this is not reducing the need for multi-core.

Around the same time as this announcement, Intel announced that they have an 80 core processor in the works. Power and heat will have to be addressed for a processor with 80 cores to come to market. So 45nm may mean some increase in clock speeds for future processors, but its primary use will be enablement of a higher number of cores.

Concurrent Computing

There is no easy solution, but there are several options, and they all involve bringing concurrency to your software. Concurrent computing (or parallel programming, as many refer to it) is likely to be a very hot topic in the coming years, so it's a good idea to start preparing now.

Since multi-core servers already make up most of the new servers shipments, concurrent computing in the enterprise will quickly become a way of life. So we need to put some thought into two things: how to make existing applications run concurrently, and how to build new systems for concurrency.

More people are talking now than any time in recent memory about how to do multi-threaded software development as the primary answer to concurrency. However, instead of writing our application code to be multi-threaded, we should consider how to abstract threading out of application code. Multi-threaded code is difficult to write and difficult to test, which is why many people have avoided it in the first place.

At Rogue Wave Software we have been working for several years in the area of "Software Pipelines". Software Pipelines* is an approach that can be used to abstract the threading model out of the application code. As software developers, you would not mix your UI code with your business logic, and for good reasons. A similar principle should apply for programming for concurrency -- the threading model should not be driven from within the application logic.

There are several important benefits to this approach. Removing threading from application code means:· The application developer doesn't have to own the threading model· Existing applications can move into a concurrent environment with much less effort· Makes it easier to scale to additional computing resources without modifying the application· If done right, the application can continually be tuned for performance without modifying application code

This approach does not allow the application developer to wash their hands entirely of concurrency. Application code needs to be written to be thread-aware, but does not need to have threads written into it. For more information on Software Pipelines, you can read this white paper (you have to log in to webservices.org to download it).

Consider how to abstract your threading model from your application logic, and you may find a smoother (concurrent) path ahead.

* Software Pipelines is a general term, not owned or trademarked by Rogue Wave or anyone as far as I'm aware. It also does not require the use of any of our technology. Software Pipelines borrows conceptually from hardware pipelines and also from fluid dynamics, which has interesting parallels to software systems.

Categories: Multi-Core

Intel® Core™2 Extreme Processor


Extreme exhilaration. Extreme enjoyment.
Whether it's gaming, digital photography, or video editing, today's high-impact entertainment demands breakthrough technology. Now with a new version based on Intel's cutting edge 45nm technology utilizing hafnium-infused circuitry to deliver even greater performance and power efficiency.

Intel® Core™ i7 Processor Extreme Edition

Conquer the world of extreme gaming with the fastest performing processor on the planet: the Intel® Core™ i7 processor Extreme Edition.¹ With faster, intelligent multi-core technology that accelerates performance to match your workload, it delivers an incredible breakthrough in gaming performance.

But performance doesn't stop at gaming. You'll multitask 25 percent faster and unleash incredible digital media creation with up to 79 percent faster video encoding and up to 46 percent faster image rendering, plus incredible performance for photo retouching and editing.¹

In fact, you'll experience maximum performance for whatever you do, thanks to the combination of Intel® Turbo Boost technology² and Intel® Hyper-Threading technology (Intel® HT technology)³, which activates full processing power exactly where and when you need it most.

Intel® Core™2 Duo Processor

Maximum everything. Energy-efficient performance. Multimedia power.
Intel® Core™2 Duo processor


Based on Intel® Core™ microarchitecture, the Intel® Core™2 Duo processor family is designed to provide powerful energy-efficient performance so you can do more at once without slowing down.

Intel® Core™ 2 Duo desktop processors

With Intel Core 2 Duo desktop processor, you'll experience revolutionary performance, unbelievable system responsiveness, and energy-efficiency second to none.

Big, big performance. More energy efficient.¹ Now available in smaller packages. The Intel Core 2 Duo processor-based desktop PC was designed from the ground up for energy efficiency, letting you enjoy higher performing, ultra-quiet, sleek, and low power desktop PC designs.

Multitask with reckless abandon. Do more at the same time, like playing your favorite music, running virus scan in the background, and all while you edit video or pictures. The powerful Intel Core 2 Duo desktop processor provides you with the speed you need to perform any and all tasks imaginable.

Love your PC again. Don’t settle for anything less than the very best. Find your perfect desktop powered by the Intel Core 2 Duo processor and get the best processing technology money can buy. Only from Intel.

  • • Up to 6MB L2 cache
  • • Up to 1333 MHz front side bus

Intel® Desktop Board DG965MQ

The Intel® Desktop Board DG965MQ is based on the Intel® G965 Express Chipset that supports 1066-MHz system bus, Intel® Graphics Media Accelerator X3000 (Intel® GMA X3000) with Intel® Clear Video Technology, dual-channel DDR2 800 memory and discrete PCI Express* x16 graphics in the microBTX form factor. Premium features such as support for Intel® Core™2 Processor with Viiv™ Technology∇, Intel® High Definition Audio (enabling 7.1 surround sound), Dolby* Home Theater* certification, Intel® PRO 10/100/1000 Network Connection and 1394a deliver stability and new features for consumers to enjoy a great digital entertainment experience. This Intel Desktop Board comes with the software required to meet Intel® Core™2 processor with Viiv™ technology brand verification requirements, which simplifies the task of building a PC based on Intel® Core™2 processor with Viiv™ technology.

The Intel® Desktop Board DG965MQ is Microsoft Windows Vista* Premium Ready. The Intel® G965 Express Chipset fully supports the visually stunning Windows Aero* user interface with amazing transition effects and realistic animations.

Intel® Desktop Board DG965MQ
Click to enlarge
View Available Configurations

Get Windows* Hardware Quality Labs (WHQL) Information

View Industry Specifications

View Regulatory Compliance Information

Tuesday, 10 February 2009

Intel® Multi-Core Technology

Permanently altering the course of computing as we know it, Intel® multi-core technology provides new levels of energy-efficient performance, enabled by advanced parallel processing and next-generation hafnium-based 45nm technology. Incorporating multiple processor execution cores in a single package delivering full parallel execution of multiple software threads, Intel multi-core technology enables each core to run at a lower frequency, dividing the power normally given to a single core. This provides a breakthrough experience in notebook and desktop PCs, workstations, and servers.

Central to our technology roadmap, Intel® multi-core processors based on 45nm Intel® Core™ microarchitecture are paving the way to the next revolution in processor technology—next-generation 32nm multi-core processors. By innovating future architectures that can hold dozens or even hundreds of processors on a single die, we're ensuring that Intel® technologies will continue to outpace demands well into the future.

Intel® quad-core technology

Intel® quad-core technology

Discover next-generation 45nm Intel® quad-core technology providing four cores in a single processor. Ideal for multitasking and multimedia, Intel quad-core technology delivers exceptional energy-efficient performance for the ultimate computing experience.

Intel® dual-core technology

Intel® dual-core technology

With two complete cores in a single processor, desktop and notebooks, workstations, and servers with 45nm Intel® dual-core technology enables enhanced multimedia, gaming on the go, and energy-efficiency without compromise to performance.

Business: Perform with power

Build a scalable, flexible infrastructure with servers based on Intel® Xeon® processors and Intel® Itanium® processors.

See how companies like yours are being successful with multi-core and other Intel® technologies.

Defining your digital life

Explore multi-core processors designed for multimedia and high-impact gaming experiences.

Get dual-core multitasking performance for desktop and mobile PCs

Unstoppable: DDR400 vs. Rambus : Introduction Full Speed Ahead: DDR400 Vs. RDRAM (PC1200)

The role that the processor plays in determining the performance of a PC system is often overestimated by many users. In practice, slow RAM modules in combination with the chipset are what hampers optimal performance - the result is that the CPU cannot run on a full load.


A hand-picked DDR400 memory module from Infineon (engineering sample), which was able to run at over 200 MHz memory clock without any difficulties. Even at a setting of CL2.5, the test platform ran stably. Infineon "officially" states that they not produce DDR400 for the time being.

The cards are on the table. In the race to achieve the highest memory performance, we used the most powerful, hand-picked modules straight from the labs of the manufacturers. Let's start out with some concrete numbers first: with the DDR400 module the latency is 2.5, and the RDRAM modules include special versions for 600 MHz memory clock (PC1200). Here, the access time is 32 nanoseconds, and in the meantime, the fastest modules with 35 ns are due to hit the market in a few weeks. So the occasion that gave rise to this test is obvious: with the onslaught of motherboards and memory modules for DDR3 on the market, the question to ask is to what extent the overall performance can be boosted with DDR200 and DDR266. In addition, the Rambus platform with the Intel Pentium 4 has become available for prices that are more reasonable than ever before. The fact is unavoidable: RDRAM memory for PC800 (400 MHz memory clock) is sometimes even less expensive than DDR333 memory (166 MHz memory clock and CL2.5).

Rambus Mobile Memory for Smartphones



Mobile device storage may vary when it comes to optimizing quality and performance of visual files. A lot has to do with the capacity of the memory used and surely such a loophole becomes a demand for phone owners. However, Rambus may hold the key in improving this glitch with its mobile memory solution.

Advanced visuals are still a long way off, although the iPhone 3G does have some pretty impressive performance in its game lineup. Rambus’ Mobile Memory Initiative relies on a variant of the separate, flexible clock speeds used in XDR memory, albeit running on lower power and using low variance signaling to create extremely high bandwidth.

The Rambus Mobile Memory offers 4.3Gbps from a single chip at just 100mV, which is roughly the equivalent of five times the headroom of the 800-megabit mobile memory by today’s standards.

Rambus hopes MMI-based products will roll out early next year.


AMD Phenom II X4 940 Review - Not the Second Coming

Apparently the Phenom II processors are already available from a variety of resellers online. My initial information was that product would be several weeks away, but it now appears as if AMD has staged a hard launch. For example, Newegg already has product for sale.

The subtitle of this article should not be misconstrued as a negative observation. However, readers coming into this article should not expect miracles from AMD’s latest piece of silicon. The Phenom II is being launched today in two flavors, but unfortunately for consumers these products will not be widely available for some weeks. AMD fans should rejoice though, as AMD has produced a part that is far more competitive with what Intel has to offer.

The plight of the Phenom architecture has been well documented since since its initial release in the late Fall of 2007. There was a lot of anticipation for the architecture, and most of us were hoping that the latest brainchild of the engineers that brought us the original Athlon and the Athlon 64 would strike gold again with a forward looking design which would tear the wings off of the mighty Core 2 series of products from Intel. Alas, these dreams were shattered by the cold reality of a processor, while faster per clock than the Athlon 64 X2, fell short of the performance mark set by Intel with the Core 2 Duo and Core 2 Quad products. Throw in a dash of TLB errata which sunk the B2 revision in terms of industry adoption, and we can see that AMD had dealt themselves a poor hand with the first generation of Phenom processors.

AMD has regained some of their stature by producing in a fairly timely manner the B3 revision Phenoms and slowly ramped the speed up on them. The pinnacle of these parts is the Phenom 9950 BE, which is now a 125 watt part running at 2.6 GHz. What has kept AMD afloat in these hard times is the ability to produce these parts at good yields and sell them at competitive prices. AMD also maximized their earning potential by selling triple and dual core Phenom parts that used scavenged quad core dies. All in all, AMD survived, along with the help of a solid chipset portfolio which includes the very impressive 780G and 790GX parts, which have redefined integrated performance in terms of 3D and video playback.



Last January AMD announced that it was getting back the very first silicon of the 45 nm Phenom, along with the first B3 samples coming off the 65 nm line. This was a time of great excitement for most people around the industry, as they expected the B3 processors to be a bit more competitive with the Core 2 Quad, and some were frankly giddy over the possibility that the 45 nm Phenom would finally dethrone the Core 2 regime. There were some optimistic folk who thought that AMD might be able to ship 45 nm parts in the Fall of 2008, and thereby beat Intel to the Nehalem punch. Cruel fate again turned against these souls, and AMD had to temper enthusiasm by reminding people that production silicon will not be coming off the line anytime soon, and it will be late 2008 at the earliest when products would hit the shelves.

Sure enough, Fall came around and revised roadmaps did in fact show that desktop 45 nm processors would be on the street at the beginning of 2009, but AMD was able to ship to select partners the new 45 nm Opterons. Initial testing on these Opterons raised a few eyebrows, as they were much more able to compete with the latest Xeons in a variety of applications where the Xeons had simply dominated the older B3 based Opterons. Word was starting to leak out of AMD that they were seeing upwards of 20% to 40% more performance per clock with the 45 nm processors as compared to the 65 nm versions. This of course piqued interest, as the potential of not only competing with the Core 2 Quads was there, but if AMD had played its hand right, it could have a competitor to the Nehalem and 2009 could be an entirely new ball game.



Here’s the spoiler. The Phenom II is not the second coming for AMD. It is not what the Athlon 64 represented when it was unleashed against the Pentium 4 all those years ago. What the Phenom II actually represents though is a milestone for AMD and their design teams.

Nehalem Revolution: Intel's Core i7 Processor Complete Review .

This is an article we have all been anticipating for years now as it introduces the most dramatic shift in Intel processing technology since the introduction of the front-side bus. And ironically, it is this shift that will finally remove the FSB from Intel products for good. The Nehalem core architecture has been the focus of most of Intel's Developer Forums for the last 24 months and the culmination of the technology, marketing and products begins today.

Intel's Core i7 processors will bring a dramatic set of changes to the enthusiast and PC community in general including a new processor, new CPU socket, new memory architecture, new chipset, new motherboards and new overclocking methods. All of that and more will be addressed in our review today so be prepared for a LOT of valuable information.

The Nehalem Architecture - Years of data summed up

We have done more than our share of technical documentation of the architecture and design, enough so that I feel that duplicating all of it here would be somewhat of a disservice to our frequent readers. I will highlight the most important architectural shifts in the Nehalem design here but I still encourage you to read over my much more in-depth look at the processor design published in August: Inside the Nehalem: Intel's New Core i7 Microarchitecture.



Here you can see a die shot of the new Nehalem processor - in this iteration a four core design with two separate QPI links and large L3 cache in relation to the rest of the chip. The primary goal of Nehalem was to take the big performance advantages that the Core 2 CPUs have and modularize them. Now with the Nehalem design, which will be branded as the Intel Core i7, Intel can easily create a range of processors from 1 core to 8 cores depending on the application and market demands. Eight core CPUs will be found in servers while you'll find dual core machines in the mobile market several months after the initial desktop introduction. QPI (Quick Path Interlink) channels can also vary in order improve CPU-to-CPU communication.



At a high level the Nehalem core adds some key features to the processor designs we currently have with Penryn. SSE instructions get the bump to a 4.2 revision, better branch prediction and pre-fetch algorithms and simultaneous multi-threading (SMT) makes a return after a brief hiatus with the NetBurst architecture.

HyperThreading Returns



I mentioned before that Intel is using Nehalem to mark the return of HyperThreading to its bag of weapons in the CPU battle; the process is nearly identical to that of the older NetBurst processors and allows two threads to run on a single CPU core. But SMT (simultaneous multi-threading) or HyperThreading is also a key to keeping the 4-wide execution engine fed with work and tasks to complete. With the larger caches and much higher memory bandwidth that the chip provides this is a very important addition.

Intel claims that HyperThreading is an extremely power efficient way to increase performance - it takes up very little die area on Nehalem yet has the potential for great performance gains in certain applications. This is obviously much more efficient than adding another core to the die but just as obviously has some drawbacks to that method.



Here you can see Intel's estimations of how much HyperThreading can help performance in specific applications. Surprisingly one of the best performers is the 3DMark Vantage CPU test that simulates AI and physics on the processor while POV-Ray 3.7 still sees huge 30% boost in performance for this relatively small cost addition in logic.

Welcome to the Uncore, we got fun and games...



A new term Intel is bringing to world with this modular design is the "uncore" - basically all of the section of the processor that are separate from the cores and their self-contained cache. Features like the integrated memory controller, QPI links and shared L3 cache fall into the "uncore" category. All of these components that you see are completely modular; Intel can add cores, QPI links, integrated graphics (coming later in 2009) and even another IMC if they desired.