Tilera targets Intel, AMD with 100-core processor

Tilera hopes its new chips either replace or work alongside chips from Intel and AMD

Tilera on Monday announced new general-purpose CPUs, including a 100-core chip, as it tries to make its way into the server market dominated by Intel and Advanced Micro Devices.

The two-year-old startup's Tile-GX series of chips are targeted at servers and appliances that execute Web-related functions such as indexing, Web search and video search, said Anant Agarwal, cofounder and chief technology officer of Tilera, which is based in San Jose, California. The chips have the attributes of a general-purpose CPU as they can run the Linux OS and other applications commonly used to serve Web data.

"You can run us as an adjunct to something else, though the intent is to be able to run it stand-alone," Agarwal said. The chips could serve as co-processors alongside x86 chips, or potentially replace the chips in appliances and servers.

Chip makers are continuously adding cores as a way to boost application performance. Most x86 server chips today come with either four or six cores, but Intel is set to release the Nehalem-EX chip, an x86 microprocessor with eight cores. AMD will shortly follow with a 12-core Opteron chip code-named Magny Cours. Graphics processors from companies like AMD and Nvidia include hundreds of cores to run high-performance applications, though the chips are making their way into PCs.

The Gx100 100-core chip will draw close to 55 watts of power at maximum performance, Agarwal said. The 16-core chip will draw as little as 5 watts of power.

Tilera's chips have an advantage in performance-per watt compared to x86 chips, but some will be skeptical as the chips are not yet established, said Will Strauss, principal analyst at Forward Concepts.

"I don't think an average person is going to run out to buy a computer with Tilera in it," Strauss said. Intel has the advantage of being an incumbent, and even if Tilera offered something comparable to Intel's chips, it would take years to catch up.

But to start, Tilera is focusing the chips on specific applications that can scale in performance across a large number of cores. It has ported certain Linux applications commonly used in servers, like the Apache Web server, MySQL database and Memcached caching software, to the Tilera architecture.

"The reason we have target markets is not because of any technological limitations or other stuff in the chip. It is simply because, you know, you have to market your processor [to a] target audience. As a small company we can't boil the ocean," Agarwal said.

The company's strategy is to go after lucrative markets where parallel-processing capability has a quick payout, Strauss said. Tilera could expand beyond the Web space to other markets where low-power chips are needed.

It helps that applications can be programmed in C as with an Intel processor, but programmers are needed to write and port the applications, Strauss said. "How easy is it to port Windows or Linux also remains to be seen," he said.

Applications like Apache and MySQL already run on x86 chips and can be ported to run on Tilera chips, company executives said. In a co-processor environment, x86 processors will run legacy applications, while the Tilera will do the Web-specific applications, he said.

"As a smaller company, we can focus in on a couple of applications, drive those, and over time as we grow, we can expand," said Bob Doud, director of marketing at Tilera. The company didn't talk about the markets it would like to go into in the future.

However, industry analysts say that application performance either levels off or even deteriorates as more cores are added to chips. Part of the performance relies on how the cores are assembled, said Agarwal, who is also a professor of electrical engineering and computer science at the Massachusetts Institute of Technology.

For faster data exchange, Tilera has organized parallelized cores in a square with multiple points to receive and transfer data. Each core has a switch for faster data exchange. Chips from Intel and AMD rely on crossbars, but as the number of cores expands, the design could potentially cause a gridlock that could lead to bandwidth issues, he said.

"You can have three or four streets coming in but ... it's hard to imagine 30 streets coming into an intersection," Agarwal said. The mesh architecture used in Tilera chips is expandable as the square gets bigger, he said.

In addition to additional cores, the new Tilera chips include many upgrades from their predecessors. The chips are speedier, running at up to 1.5GHz, with support for 64-bit processing. The chips will be made using the 40-nanometer process, which make them smaller and more power-efficient. Earlier chips were made using the 90-nm process. The chips will start shipping next year, with the 100-core chip scheduled to ship in early 2011. Volume pricing for the chips will range from US$400 to $1,000.

Tags AMDCPUsintel

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Agam Shah

IDG News Service




These people deserve better press

Courageous to come up with new stuff in a market that's saturated with stupidity. But the article left me wanting for little details like, what instruction set are they using? Five watts sounds like something that might do well in a laptop. And for those of us who aren't married to windows, no x86 is actually a plus.



A boost for graphics?

It sounds like you could use a single 100-core chip to build a serviceable portable render farm that would be affordable to 3D amateurs, if only to hire at least.



It's got to be some other type of instruction set, since the only three companies on the planet allowed to make x86 chips are Intel, AMD, and Via.

It's something else, but something that Linux is designed to run for, at least.




It's based on the ARM instruction set.

I'm skeptical at the lack of information they're releasing in these anouncements. Additionally they make it sound like it's easy to beat AMD and Intel in processing performance; if it was easy AMD and Intel would do it too. For instances, the artciles talks about using switches to increase scalibility in the core communication bus instead of a crossbar. what they don't mention is that the crossbar will never slow the bus down, where switches can be saturated and cause delays.

Also, for those ripping on x86. Yes it's a CISC instruction set, and yes that leads to inheirent delays in simple arethmetic and some logic processing. ARM is a RISC instruction set, where those computations are faster. But there's a lot more to the situation than just those instructions which execute faster. RISC processors can't execute code in out of order sequences, for one example. The code is also larger is size, taking up more RAM, more processor cache, taking up more spance on that fancy data bus they've engineered.

Most software currently isn't written for more than half a dozen cores either. More importantly most kernels do not scale well with many cores. You can write your application to scale to a million cores, but if it's running on a kernel that only scales to 6 or 12 then you're going to be bottlednecked by that most of the time.




It is a MIPS based architecture.



More details please....

To fit in 100s of cores you most likely need to strip something away.

While I'm not saying this concept is untenable, allot of detail has been left out as to how they are going to fit 100 cores on one chip. If you compare to GPUs for example, they have many cores however these cores tend to have a tiny instruction set covering the few operations required for common graphical rendering approaches.

Having a better interconnect mechanism is great but I personally would like to hear more about what they're going to strip down in comparison to modern CPUs. There's allot you can remove from contemporary x86 for example. On the subject of x86, in agreement with the above comment I also find it vague as to what instruction set/common architecture this chip would use.



God I hate Agarwal. Everything he said in 6.002 was lame then, and everything he says is lame now.




FYI, Sun Microsystems released a low power 8-core SPARC CPU, the UltraSPARC T1 codenamed Niagara 1, in 2005. The design of this CPU was open sourced. Is this a derivative of that? This CPU and the whole line of CPUs following it are targeted at high throughput workloads like webservers and database servers. An updated multi-socket version of this chip, UltraSPARC T2+ codenamed Victoria Falls, currently holds the world record in TPC-C.

Sun has already released details about their newest 16-core CPU, "Rainbow Falls", not yet released.




There's already a 240-core coprocessor on the market, and you can get it in a fully-integrated add-on card package for around $200. That's retail price. And it's commonly available -- you can go down to your local computer store and pick one up right now. Or you could get a version made with four sockets in rack-mount server, for a total of 960 cores in 1U. There's an industry-standard API which works on Linux, MS Windows, and Mac OS X. And since the hardware is common and popular, it's got the economies of scale to continue to get cheaper and faster. Oh, and odds are relatively good that you have one (or an earlier model) in your desktop computer right now.

This is the Nvidia GPU, of course. Not just for graphics anymore. That's the *real* competition for this product, and I'm skeptical that a startup without that scale advantage will stand a chance unless they have something *really* unique -- and it doesn't sound like they do.

The fact that it's a general-purpose CPU and could theoretically run a real OS rather than just serving as a coprocessor for calculation is interesting, but since that's unlikely to be an efficient use, seems mostly like a novelty.



Think again...

CISC and RISC means basically nothing - when companies came out with RISC processors, CISC was basically a term invented by marketing to differentiate between them. In any case, basically every CISC processor is actually now a RISC processor - the CISC instructions are broken up into something more like RISC instructions before they're executed.
Also, CISC code isn't necessarily larger in size - in fact, at least on Mac OS X, the PowerPC and x86 halves of code tend to be similar in size. And CISC versus RISC means nothing in terms of latency for simple instructions - you could make a either one very fast for simple instructions if you wanted.
And perhaps most blatantly, there is no restriction on whether or not you can have an out-of-order processor based on whether it's RISC or CISC. For that matter, the first x86 CPU to do out-of-order execution, the Pentium Pro, broke the x86 CISC instructions into an internal micro-op, more like RISC.



Details, details...

It's not ARM, it's not SPARC, it's not even MIPS, although it's closer to MIPS than anything else. It's their own instruction set. I haven't seen any details for this chip yet, but their previous offering (64 core) was 64 bit VLIW, meaning one to three instructions per 64 bit word.

If you look at a picture of a PowerPC or Sparc chip, you can see the footprint of the reorder engine - the part that takes care of dependencies between multiple instructions executing at the same time. It's huge. VLIW does the instruction dependency at compile time. There are many arguments for or against this, but there is no argument that it takes up a lot of die space.

Previous Tilera processors also had relatively limited caches for each core, which saves more die space, but they don't seem to have announced what they'll have on the 100 core chip. If your program fits in cache then you run full speed, but even an x86 will thrash if your cache isn't big enough.

Keep in mind that they've been shipping a 64 core processor for two years, so 100 cores next year isn't as big a deal as it sounds.



Re: Interesting

> I'm skeptical at the lack of information they're releasing in these anouncements.

This article lacks the details that a person who knows what "RISC" and CISC" stand for would want to know. This is because the author is writing for a broad audience. If you want technical details, you are in the wrong place. Read the papers on RAW, the research project that was spun off into Tilera:


> Also, for those ripping on x86...

Who is ripping on x86?

> Most software currently isn't written for more than half a dozen cores either.

Look at the markets Tilera is aiming these chips at. These applications have lots of parallelism, require very high throughput, and need a low power footprint. The benefits of a system using a custom processor are large enough that paying someone to write software for the job is more than worth it. You seem to be arguing that this chip is not the best choice for every application. No one is saying that it is. I assume the folks at Tilera believe it is the best choice for some applications. I read many of their papers, and they make a persuasive argument. If you disagree, lets hear why.





240 cores? What are you calling a "core"? A SIMD compute element?

You're comparing apples and oranges. Can you run an operating system on a single nVidia "core"? (No)

If what you want is a ton of floating point compute or graphics rendering, then GPUs are great. But they're a pain to program for general apps.



Is it just Vaporware?

I like this line:
<cite>"The chips will start shipping next year, with the 100-core chip scheduled to ship in early 2011."</cite>

So, nothing is shipping yet. Next year, maybe something will ship -- who knows how many cores. And, by 2011, maybe 100 cores is possible, if they're still in business?



something unique

There is something very unique here.

Let's assume the claims are legitimate and that we're talking about general purpose CPU cores in this Tilera chip. If that's true, then it already has distinct advantages over the Nvidia T10.

These chips will offer mesh architecture where performance scales well to larger numbers of cores compared to other processors that do not.

They also offer some amazingly low power. 55W max for 100 cores works out to a maximum of 0.55W/core. The Nvidia card can take up to 187.82W max at 240 cores, or 0.78W/core. This is an increased power consumption of 42%. More than that, it would require a host system. The 100 core Tilera chip would not be a plug-in card for a PC. The server would be configured to use the Tilera chip as the CPU. So, configure a normal system that would use an Nvidia GPU solution, remove the Nvidia card and replace the 90W host CPU with the 55W Tilera and you've just removed 222.82W from your config. If we make this a dual processor system to give comparable core count to the Nvidia solution, a second Tilera at 55W will still save you up to 167.82W per server. This is a big deal in large data centers.

Plus, we don't know how much memory these chips can address. The Nvidia T10 can only address 4GB. If the Tilera can address more, its usefulness becomes very apparent.

If the performance at 1.5GHz and 100 cores is good, then this would be an excellent replacement for a lot of general purpose servers in data centers. A blade-style server using systems based on these cores could potentially save a lot of money and offer better performance than a VM environment.

Let's not forget lesser applications. If I'm running all open source software on a linux platform, then I don't care which processor architecture I'm using. The 16-core version at 5W max would be perfect for a linux laptop/netbook or even a standard office desktop. Hundreds or thousands of office desktops with maximum CPU power draws at 85W lower than normal would save a lot of money too.



CISC code isn't necessarily larger in size

> Also, CISC code isn't necessarily larger in size..
Sir, CISC is supposed to be smaller in size, not larger.



Some further info

Hi there. I work for Tilera and can clear up a few things.

The cores are RISC but not specifically based on ARM, MIPS or PowerPC. But comparatively, closest to a MIPS instruction set. We designed our own cores because we didn't think the existing selection of licensable cores had the right combination of die-area and power-efficiency. And yes, these are full featured 64-bit cores that can individually run Linux, Apache, etc.

Each core has 32KB L1i and 32KB L1d cache. And each core has 256KB L2 cache. Full virtual memory support with TLBs, etc. With our coherent distributed caching system, the L2's of all the cores (26MB total in the 100 core chip) can be accessed and shared by other cores. This avoids the power problems and contention problems that you would have with large centralized caches.

For the memory-hungry, the TILE-Gx100 can address up to 1TB of directly connected DDR3 memory.




ATI HD 5850 Sapphire

I run on average, 1440 general purpose cores in my pc and it only cost me $400 NZD, lol!

Comments are now closed.

Most Popular Reviews

Follow Us

Best Deals on GoodGearGuide


Latest News Articles


GGG Evaluation Team

Kathy Cassidy


First impression on unpacking the Q702 test unit was the solid feel and clean, minimalist styling.

Anthony Grifoni


For work use, Microsoft Word and Excel programs pre-installed on the device are adequate for preparing short documents.

Steph Mundell


The Fujitsu LifeBook UH574 allowed for great mobility without being obnoxiously heavy or clunky. Its twelve hours of battery life did not disappoint.

Andrew Mitsi


The screen was particularly good. It is bright and visible from most angles, however heat is an issue, particularly around the Windows button on the front, and on the back where the battery housing is located.

Simon Harriott


My first impression after unboxing the Q702 is that it is a nice looking unit. Styling is somewhat minimalist but very effective. The tablet part, once detached, has a nice weight, and no buttons or switches are located in awkward or intrusive positions.

Latest Jobs


Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?