December 20, 2006

RDC in China

I just came back from a couple of weeks in China. Fascinating country. Besides the usual customer visits, we also held three Regional Development Conferences (RDCs) in Shenzhen, Shanghai and Beijing.

The RDCs were a huge success with on average more than 400 attendees per day. And they were all very interested in hearing about Wind River's latest offerings and future plans.

How do I know they all were interested? Well, my leading indicator was that all seats were taken from the front working towards the back. Somewhat different from your typical American or European conference, where the back seats seem to be the hottest commodity. (BTW, I am of course not indicating that our American and European friends are not interested... Perhaps they have better eye sight )

Img_1751_edited1

  Paul Chen on stage in Shenzhen

We discussed in some detail the various components of Wind River's Platforms, including new features of our VxWorks, Linux, WorkBench and Diagnostics offerings. If you haven't seen it already, we just released a new version of our platforms, so there was plenty to talk about. I also had an opportunity to go into some detail about our multicore/multiprocessing offerings. That tied nicely into a presentation from FreeScale about their multicore plans.

Judging from the amount of questions in the QA section and the questions I got afterward, multicore is getting huge in China. A lot of discussion whether Asymmetric Multiprocessing (AMP) or Symmetric Multiprocessing (SMP) is "better". As with many things in life, there is not a simple answer to that question more than the unsatisfactory "it depends". It is really an area where there is a lot of confusion. The confusion starts at the definition level, where some people might define SMP as Symmetric Multiprocessors (meaning multiple processors that are identical but might or might not have separate OSes on top if them) while others define it as Symmetric Multiprocessing - meaning symmetrical processors handled by one OS. I'll talk about my views on when to use what in a future blog.

I continue to be impressed by the quality of the engineers in China. Very smart and creative. So impressed in fact that we decided to open a development center in Beijing. We have the first ten engineers on training in Alameda right now so that we can get a running start next year. The team will focus on doing BSP work for both our VxWorks and Linux platforms and I expect a lot of new exiting technology coming out of the group in the coming years. It is also a definite advantage to have developers close to your customers, something we have experienced again and again through our development centers around the US, Canada and Europe. Nothing beats that local first hand experience.

As for my other experiences in China I'll have to refer to the old saying, "what happens in Beijing, stays in Beijing"...

October 23, 2006

Embedded = IT + 15 years?

The cryptic title refers to an observation that many people have about the Embedded Industry; that it is trailing the IT industry by a number of years and that most trends in the IT world eventually will find their way to the embedded world. This is especially true as we are finding our way to the world of DSO. See John's blog about DSO.

That raises two questions:

  1. Why is it that Embedded is so "behind"?
  2. Do all IT trends come to Embedded?

I believe the answer to these questions lie somewhere in understanding what is different between the deceivingly similar worlds of IT and Embedded. I will focus on the technical comparison between these worlds, even though there could be a similar discussion on the business side.

There are a lot of similarities, after all both are all about running software on processors. So trends around programming efficiency (higher level languages, programming models, higher abstraction levels) and better hardware (faster, cheaper, fewer variants) should apply to both, right?

Let's look at some of these trends:

Programming efficiency

In the IT world the trend has certainly been very clear. When it comes to programming languages we have moved from binary to assembly, to FORTRAN/COBOL/C to Java/C++/C# and sometimes up to modeling languages. With each of these higher level languages we also typically get a higher abstraction level. We need to know less about the hardware we run on and we get tons of standardized libraries that do a lot of the work we used to have to program ourselves. This makes it possible to reuse the code in the underlying platform and focus on what is new on the top. Programmers become more efficient.

This added efficiency comes at a cost. The cost is that we use more processing power, more memory, etc. for any given task, than what it would take if you wrote the same application from the ground up. But it is worth it, since it is cheaper to throw hardware at the problem than to add programmers.

In a device the economics are sometimes different. A more powerful processor not only costs more, it is also using a lot more power, draining batteries faster, etc. For mass-produced cheap devices that extra cost of a faster processor and more memory might make it too expensive.

Java is a great example of how the trade-offs are different for devices. Michael Scharf would argue that Java is as efficient as C, but of course he is wrong ;-). The way Java gets decent performance on a work station is by using a technique called just-in-time (JIT) compilation, which means that as you execute the code, you actually convert the Java byte-code to native code that the processors can run directly. A JIT compiler uses a lot of memory, since it needs to have a smart compiler on the target and it needs to cache the compiled code. This is of course a problem on a device, where you want fast execution and small footprint.

Another problem with higher level languages such as Java is that the more you abstract from the hardware, the harder it is to control the timing of execution. Many devices have real-time considerations, meaning that you have to guarantee execution of code within a certain time window. A classic example is an airbag. You don't want the Java garbage collector to kick in exactly that millisecond when the airbag is supposed to inflate.

Hardware trends

Many hardware trends move quickly from the desktop to embedded. Take things like utilizing power efficient processors, multicore and cheaper memory. All these trends quickly find their way into devices.

One thing that has not happened, at least not yet, is the consolidation to a fewer number of processor architectures and variants. All the various mini-computers of the 70s and 80s are now replaced with just a couple of architectures, x86/IA32 being the dominant one. In embedded we still have a lot of architectures and even within one family there are endless of variants. Individual PowerPC processors, for example, have many differences such as hardware floating point in multiple flavors (no FP, standard PPC, Altivec, E500 v1 and v2, ...). This means that OS and tools vendors have to provide support to a big matrix of variants. Contrast that with the x86, where the same tools and OS run on all variants from multiple vendors. Why is that?

As with everything there are a number of reasons, including:

  • The need for backwards compatibility has been much greater in the IT space
  • The drive towards standards have been faster in the IT space
  • Workstations/servers/desktops have more similarities than say a phone and a network router
  • Devices do more specialized tasks

The standardization that is happening in the embedded space is very vertical, so for example most advance phones are using ARM based processors these days.

The questions

So, did we answer the questions of why and if embedded always will be behind IT. Not really ;-).

But I am interested in your opinions!

In reality there are probably some things that will always be different (such as the economics of using higher level languages) and some things that could be much more similar. In particular I believe we need to change our thinking when it comes to driving standards. We need better standards in this space to become more efficient.

This is really what DSO is all about: Think differently and drive the industry to a higher level of efficiency.

October 05, 2006

Is footprint still important for embedded devices?

Footprint, or the amount of memory your software use, has traditionally been a big deal for embedded devices. But nowadays, with RAM and flash memory cost being a fraction of what it used to be, surely footprint is a non-issue? Just throw in more memory. It's cheap, right?

Well, not so fast there Mr. (or Mrs.)

Call me old-school, but I believe that footprint is still a huge issue in many devices (not all, but many), and I'll tell you why.

When I got started doing serious (or not so serious) programming, my machine of choice was a TRS-80, also called Trash-80 by the Pet/Commodore/Apple crowd. Well, I actually didn't have a choice since this was the machine my father brought home, being a gadget freak. We are talking late 70's here and the system I used had a whopping 4KB of RAM, quickly upgraded to 16KB. The programming language (after a quick stint of MS Basic) was Z80 assembly language. With that kind of constrained environment, you develop a 6th sense of solving your programming problems in the simplest way possible, using as little code and data as you possibly can. I am mentioning this not only to show off my age, but also to put things into perspective.

In the PC/Workstation/Enterprise markets, footprint is clearly not the top criteria when developing software. Just compare the footprint of Office 97 to Office 2003. We are talking about at least a five X increase in space needed. This is not really a huge problem since the disk and RAM sizes on your average PC have increased at least as much.

But things are different in the embedded world. I'll give you three reasons for why footprint matters, but I am sure there are more:

  1. The cost of memory is more important in a device compared to a PC
    Even though memory in the long run is getting much cheaper, so are other components too, and at the end of the day every dollar you save in hardware cost translates to profit when you sell thousands or millions of fairly cheap devices.
  2. Power consumption
    More memory use more power.  Anybody who has cursed when their Windows Mobile device dies before the end of one day's use know why this is important. It is not just the amount of total memory you have in the device, it is also the amount of powered memory at any given instance. Smart devices turn off the memory they don't use.
  3. Boot time
    Many devices, like digital cameras, need to boot really fast. On many systems, a big chunk of the boot time is to copy the software from a flash memory into a RAM memory. The smaller the code is, the faster it boots.

OK, so maybe footprint is still important, but what can I do to keep the size down on my device?

Here are a few random tips. Sometimes code size is a trade-off against other things, like number of features, performance, ease of programming etc., but I find that keeping the code size down also tends to keep the complexity down, which is a good thing for quality.

  • Choose a programming language that is optimized for code size
    This is definitely a trade-off. Languages like C are excellent for footprint (you get what you expect) while Java would be at the other end of the spectrum. C++ is one of those languages that can be very effective if you know what you are doing, but I have seen so many examples of bloated C++ applications that I advise you to be careful. Read Stroustrup's (father of C++) insightful comments. In my mind (being a former compiler writer) C++ is a very complex language to write a compiler for and that translates into a lack of transparency when it comes to getting the code you expect. Don't use all the features of C++ just because you can.
  • Choose a compiler that optimize for code size
    Most compilers optimize for speed. That is one of the bigger differences of an embedded compiler, that it also looks at code size. For example, one of the reasons we compile VxWorks with the Wind River compiler (formerly known as Diab), is not only that the code runs about 10% faster, it also is about 10% smaller compared to the gcc compiler.
  • Choose an operating system that can scale and only include what you need
    This is a big difference between operating systems designed for devices (like VxWorks)compared to general purpose operating systems (like Linux). An RTOS (Real-Time OS) like VxWorks can almost be considered to be more of a library where you only get what you actually use into your code image.

But the most important way of achieving small footprint is to architect your code for simplicity and small size. This is easier said that done, especially when you are dealing with existing code.

September 20, 2006

Intel and Multicore Tools

Intel is releasing tools to help programmers to deal with multicore. See the news.com story: Intel: Optimize applications for multicore. In the the article, James Reinders states that "It's not intrinsically harder to write threads, but developers need to get used to thinking that way and we need help from the tools." While I agree that developers need to get used to thinking that way, I actually do believe that it is intrinsically harder to write multi threaded applications. And I have the grey hair to prove it...

This begs the question, "Why is it so different to develop for multicore compared to a single core (or unicore)?".

Well, one of the reasons is that most software is written in a very serial fashion. This means that the sequence of the things a program is dealing with is very well defined and controlled. Programmers like that. No surprises due to timing and debugging is fairly straight forward. You can often use the normal paradigm of setting a breakpoint, stopping the code, look around and continue.
If you want to get any more horse power out of a multicore, you need to execute tasks in parallel. No use of adding that extra core if it sits idle while the first core is executing your neat serial program. To execute in parallel means that you have to split up the computations into separate threads that run concurrently. And we all know it is hard to do many things at once, especially if you are male (well, maybe that's a myth and even more so if you are an ex-president.

In programming, the difficulties with multitasking shows up when you are sharing data and multiple tasks are accessing the data at the same time, as exemplified by the traditional ticket booking problem. Here is a racing condition that would occur if the database that holds the number of tickets left for an event is not protected:

  • Tickets in database(TicketsLeft): 10
  • Ticket agent A reads TicketsLeft: 10
  • Ticket agent B reads TicketsLeft: 10
  • Ticket agent A subtracts 1 and writes back: 9
  • Ticket agent B subtracts 1 and writes back: 9
  • Tickets in database: 9

This way you got two tickets for the price of one!

The typical way to protect against this is through some sort of lock, to make sure that B does not access TicketsLeft until A is done. That is what mutual exclusion and semaphores are all about. The flip side of semaphores are that the more things you protect, the more likely you get two tasks waiting for each other, also called deadlocks.

Read more about racing conditions, deadlocks and semaphores at wikipedia.

There are two reasons why multitasking issues like the ones described becomes much more of a serious problem in multcores:

  1. In multicores, you need to use multiple tasks to get performance gain, while in a unicore you can keep it simple
  2. Multitasking applications tend to display any race conditions much more often on a multicore due to the higher probability that the critical code is actually executed in parallel. On a unicore you only get exposed when the timer interrupt happens at a very unfortunate time.

Tools, like the ones from Intel and HP will help you detect these issues by keeping track of semaphores and memory accesses. They are not fool proof, but any help you can get is good.

In the Wind River portfolio, we have a new great tool that will help you find threading problems as well as other very tricky problems. The tool I am thinking about is Workbench Diagnostics. With this tool you can dynamically, without stopping the system, insert code (sensor points) that can execute at any line of code and log any information you are interested in. For example, you can very easily create a sensor point where you put in some C code that will check whether a particular semaphore is taken when you access some data.

I strongly believe that tools like Workbench Diagnostics is the way most people will debug complex problems in the future. Any time you have a timing problem, and the most difficult problems to fix are timing problems, it becomes very hard to stop the system and do the traditional debugging with break points. Sensor points provide the ultimate printf debugging capability ;-)

Tomas Evensen

  • Tomas Evensen is Chief Technology Officer at Wind River. In this role, he is responsible for future technologies and architectures at Wind River. Tomas is also Vice President for VxWorks Engineering and is responsible for all VxWorks Platforms Releases.