Authors Jeremy Bennett,
License CC-BY-2.0
Who ate my battery?
Why software engineers are the key to low power software design
Jeremy Bennett, CEO Embecosm
22 March 2012
Agenda
Designing energy aware systems
Hardware and software working together
– unified system debug
Experience with OpenRISC
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Designing Energy Aware Systems
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Why?
Ericsson T65
– released 2001
– Li-Ion 720 mAh
– standby 300 h
– talk time 11 h
– includes talk/standby prediction
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Why?
Ericsson T65 Sony Ericsson Xperia X10 mini
– released 2001 – released 2010
– Li-Ion 720 mAh – Li-polymer 930 mAh
– standby 300 h – standby up to 285 h (3G) / 360 h
– talk time 11 h (2G)
– includes talk/standby prediction – talk time up to 4 h (2G) / 3.5 h (3G)
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Free Mobile Apps
“Drain Battery Faster”
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Energy Saving Today
Largely in the realm of hardware engineering
Hardware design aims to mimimize
– static (leakage) power loss
– dynamic (switching) power loss
Techniques used
– dynamic voltage and frequency scaling (DVFS)
– multiple mode operation (standby, sleep, suspend, off)
Scope for savings
– P = V2R
– on-chip voltage can range from ~0.5V to ~1.5V
– lower frequencies mean lower voltages can be used
win on both static and dynamic power loss
Is this where the greatest savings can be made?
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Greater Savings at Higher Levels
Architectural
RTL Synthesis
Gate
Layout
0% 20% 40% 60% 80% 100%
Source: LSI Logic
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Shouldn't We Look Higher?
A Linux implementation famously wasted 70-90% of its energy,
simply waking several times a second to drive a blinking cursor.
A project had to raise its clock frequency because a standard
codec caused excessive processor stalls through cache conflicts.
That project was cancelled shortly aftewards.
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Focussing on Software
Software controls the hardware
– algorithms and data flow
– compiler optimization traditionally speed over everything
Few software engineers appreciate this
– how does an algorithm affect the power consumption
– power consumption is often a secondary design criterion in software
Yet biggest savings are at the higher levels of abstraction
– choice of algorithm
– data handling
– entire software stack
Why?
– energy is consumed by the hardware computations
– but ultimate control of that hardware lies with the software
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Greater Savings at Higher Levels
Application
Compiler
Programming Language
ISA
Architecture
RTL Synthesis
Gate
Layout
0% 20% 40% 60% 80% 100%
Source: Bennett & Eder, 2011
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
This is Not New
Kaushik Roy and Mark C. Johnson. 1997. Software design for low
power. In Low power design in deep submicron electronics,
Wolfgang Nebel and Jean Mermet (Eds.). Kluwer Nato Advanced
Science Institutes Series, Vol. 337. Kluwer Academic Publishers,
Norwell, MA, USA, pp 433-460.
Choose the best algorithm to fit the hardware
Tune algorithms to manage memory size and memory access
Optimize for performance, making best use of parallelism
Use hardware support for power management
Minimize CPU and data path switching in the generate code
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Hardware Power Analysis
Late in the design flow
Slow
Source: Eder, 2011
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Software Power Analysis
Source: Eder, 2011
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Tractable Software Power Analysis
Source: Eder, 2011
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
The Tool Challenge
How do we do high level power estimation
Existing power analysis tools
– operate at gate level
– accurate, but very slow
Instruction level power analysis
– Tiwari et al, 1996
– highly parameterized formulaic approach
– no data on accuracy
Wattch: An architectural power analysis tool
– Brooks et al, 2000
– requires parameterized models of common functional blocks
– accurate to ±10%, relatively fast
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
What Have We Learned So Far?
Use a low power instruction encoding
– minimize Hamming distance between consecutive instructions
reduces switching
– ISA is very target application specific (requires profiling to create)
– up to 62% reduction in opcode switching observed
– Woo et al, 2001
Partition the register file
– 25% of registers account for 83% of access time
– partition into “hot” and “cold” registers
– average 54% energy savings compared to non-partitioned
– Guan & Fei, 2010
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Higher Abstraction Still
Specialist programming languages
– allow the programmer to exploit power/accuracy tradeoffs
– use approximate types where appropriate
– programmer annotates the code
– type checker separates out approximate and accurate code
– Sampson et al 2011
Hardware support for approximate computation
– variable bit width in floating point calculations
– up to 66% power saving
– Tong et al 2000
These are niche examples. What is the generic solution
– key is tools that allow software designers to explore solutions
– profile energy usage as easily as performance
tools from Lauterbach and research by Steve Kerrison at Bristol University
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
The Energy Aware Computing
Initiative (EACO)
Ultimate goal is a European wide program of research
Led by the Institute for Advanced Studies at Bristol University
Kicked off with three dedicated workshops in 2011
– http://www.bristol.ac.uk/ias/workshops/current-workshops/energy-aware-computing.html
Intellectual challenges
– incremental improvements
– radically new innovative approaches
Conveners
– Dr Kerstin Eder
– Prof David May
More collaborators wanted
– contact Kerstin Eder
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Hardware and Software Working Together
Unified System Debug
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
A Typical SoC Design Flow
Exploration & System
Implementation Silicon
System Design Verification
Hardware SystemC Simulation & FPGA or
Silicon
Team TLM CA Model Emulation
Software ISS + ISS + ISS/ICE + Silicon +
Team Debugger Debugger Debugger Debugger (?)
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
The Technical Solution
Embedded software tools need two key features
– they must be “peripheral aware”
when the program halts, the peripherals halt
the debugger has visibility into the peripherals
– they must work with models as well as final silicon
models of the complete SoC
high level, low level, software or FPGA emulation
This is not a technical challenge
– most debuggers extend easily to peripherals
– JTAG provides a good abstraction of the interface
– the EDA world knows how to model SoCs
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Adding “Peripheral Awareness”
(gdb) info spr picmr
PIC.PICMR = SPR9_0 = 0 (0x0)
Reading peripherals
(gdb)
– GDB info command
(gdb) set spr picmr 0x00000007
PIC.PICMR (SPR9_0) set to 7 (0x7), was: 0 (0x0)
Writing peripherals
(gdb)
– GDB set command
(gdb) pwatch picsr
Watchpoints Peripheral watchpoint 2: PIC.PICSR (SPR9_2)
– new GDB command (gdb)
– depends on target abilities
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Peripheral Awareness Using the GDB
Remote Serial Protocol
Extend using standard Remote Serial Protocol (RSP) features
– A reliable packet based protocol
qCmd packet used to access peripherals
– e.g. readspr to read a peripheral register
– e.g. writespr to write a peripheral register
Future proof against GDB upgrades
– RSP compatability is always ensured
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Debugging Models and Silicon
class SocTlmModel
: public sc_core::sc_module
SystemC TLM 2.0 {
– modeled as debug I/F …
tlm:tlm_transport_dbg_if<JtagPayload> jtagPort;
class SocCycleModel
: public sc_core::sc_module
Cycle accurate/simulation {
– modeled as pins …
sc_in<bool> jtagTck;
sc_in<bool> jtagTms;
static void
jp1_ll_reset_jp1()
FPGA/Emulation/ASIC {
– drives hardware interface …
write (lp, &data, sizeof (data));
JP1_WAIT ();
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Unified Debug Solution
Firmware World Hardware World
Hand-written
TLM 2.0 TLM
JTAG Model
Simulation or
JTAG CA model
Debugger Protocol
simulation
(e.g. GDB RSP)
Debugger
(e.g. GDB) Emulation
JTAG
or FPGA
driver
Silicon
JTAG
driver
Unified JTAG Abstraction Layer
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
JTAG Abstraction
Generic Class Diagram
RspServerSC JtagSC
JtagRegister
A generic GDB Remote Serial Protocol server communicating with a generic JTAG target
by passing a generic JTAG register.
Both the RSP server and JTAG target are abstract classes. Concrete derived classes are
created for specific architectures and specific JTAG targets
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
JTAG Abstraction
Specific Targets
RspServerSC JtagSC
JtagRegister JtagTlmSC JtagCycleModelSC
A set of JTAG derived classes provide interfaces JtagVpiSC Jtag2232SC
to common targets independent of the
processor architecture being supported.
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
JTAG Abstraction
Specific Architecture
RspServerSC JtagSC
JtagRegister JtagTlmSC JtagCycleModelSC
ArchRspServerSC ArchJtagRegType
ArchJtagRegType JtagVpiSC Jtag2232SC
ArchJtagRegType
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Using SystemC TLM to model JTAG
tlm_generic_payload Use extensions for JTAG specific
Use extensions for JTAG specific
features
features
– address, bit length, command
– address, bit length, command
get_command () Allows use of generic payload
Allows use of generic payload
set_command () – maximum portability
– maximum portability
get_address ()
set_address ()
get_data_length ()
set_data_length ()
get_extension ()
0..1
0..1 JtagPayloadCommand
JtagPayloadAddress 0..1 IGNORE
READ
RESET JtagPayloadBitLength WRITE
IR WRITE_READ
DR
getAddress () getBitLength () getCommand ()
setAddress () setBitLength () setCommand ()
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Example Adapter
TLM Thread Cycle Accurate Thread
Blocking
TLM call
TAP State Machine
tck
tdi
SystemC
tdo
signals
tms
trst
Blocking
TLM return
Reference
ReferenceApplication
ApplicationNote
NoteEAN5
EAN5
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Experience with OpenRISC
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
The OpenRISC 1000 Project
Objective to develop a family of open source RISC designs
– 32 and 64-bit architectures
– floating point support
– vector operation support
Key features
– fully free and open source
– linear address space
– register-to-register ALU operations
– two addressing modes
– delayed branches
– Harvard or Stanford memory MMU/cache architecture
– fast context switch
Looks rather like MIPS or DLX
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
The OpenRISC 1200
OpenRISC 1200 32-bit Harvard RISC architecture
Power Inst W – MIPS/DLX like instruction set
Mgmt MMU i – first in OpenRISC 1000 family
s
h – originally developed 1999-2001
B
J o Open source under the
T Debug CPU Inst n
A e – GNU Lesser General Public License
Unit Cache
G – allows reuse as a component
ALU
Configurable design
Tick Data W – caches and MMUs optional
Timer MMU i – core instruction set
s
h Source code Verilog 2001
B
o – approx 32k lines of code
Data n Full GNU tool chain and Linux port
PIC e
Cache
– various RTOS ported as well
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Hardware Development
Objective is to use an open source EDA tool chain
– back end tools for FGPA all proprietary
free (as in beer) versions available
– front end tools now have open source alternatives
OpenRISC 1000 simulation models
– Or1ksim: golden reference ISS
C/SystemC interpreting ISS, 2-5 MIPS
– Verilator cycle accurate model from the Verilog RTL
130kHz in C++ or SystemC
– Icarus Verilog event driven simulation
1.4kHz, 50x slower than commercial alternatives
All OpenRISC 1000 simulation models suitable for SW use
– all support GDB debug interface
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
The Software Tool Chain
A standard GNU tool chain
– binutils 2.20.1
– gcc 4.5.1
– gdb 7.2
– C and C++ language support
Library support
– static libraries only
– newlib 1.18.0 for bare metal (or32-elf-*)
– uClibc 0.9.32 for Linux applications (or32-linux-*)
Testing
– regression tested using Or1ksim (both tool chains)
– or32-linux-* regression tested on hardware
– or32-elf-* regression tested on a Verilator model
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Board and OS Support
Boards with BSP implementations
– Or1ksim
– DE-nano
– Terasic DE-2
RTOS support
– FreeRTOS, RTEMS and eCos all ported
Linux support
– adopted into Linux 3.1 kernel mainline
– some limitations (kernel debug, ptrace)
– BusyBox as application environment
Debug interfaces
– JTAG for bare metal
– gdbserver over Ethernet for Linux applications
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Comparative Regression Testing
of the OpenRISC 1200
Golden SystemC TLM Model
=== gcc Summary ===
# of expected passes 52753
# of unexpected failures 152
# of expected failures 77
# of unresolved testcases 122
# of unsupported tests 716
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Comparative Regression Testing
of the OpenRISC 1200
Golden SystemC TLM Model Verilator SystemC RTL Model
=== gcc Summary === === gcc Summary ===
# of expected passes 52753 # of expected passes 52677
# of unexpected failures 152 # of unexpected failures 228
# of expected failures 77 # of expected failures 77
# of unresolved testcases 122 # of unresolved testcases 122
# of unsupported tests 716 # of unsupported tests 716
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Comparative Regression Testing
of the OpenRISC 1200
Golden SystemC TLM Model Verilator SystemC RTL Model
=== gcc Summary === === gcc Summary ===
# of expected passes 52753 # of expected passes 52677
# of unexpected failures 152 # of unexpected failures 228
# of expected failures 77 # of expected failures 77
# of unresolved testcases 122 # of unresolved testcases 122
# of unsupported tests 716 # of unsupported tests 716
We can identify two types of problem
– tests which fail due to timing out with RTL, but not due to slower model
– tests which give a different result with RTL
These are candidates for possible RTL errors
Used commercially by Adapteva Inc
– 50-60 RTL errors eliminated pre-tape out
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Summary and Acknowledgements
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Summary
Future low power products will require a systems approach
– hardware and software engineers must work together
– the approach applies throughout the lifecycle
The greatest opportunity for power saving is in the software
– techniques for tackling this are still in their infancy
– we need breakthroughs in high level power modeling and simulation
We need a systems oriented tool chain
– geared to the needs of both software and hardware
– usable throughout the product lifecycle
Embecosm's unified debugging approach is an example
– allows software debugging throughout the lifecycle
The benefits can be seen already in the OpenRISC project
– hardware bugs identified by the software engineers
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Acknowlegements
Most of the work described in the first section of this
presentation on energy aware computing is due to my colleague,
Dr Kerstin Eder of the University of Bristol.
OpenRISC is a community project, to which I am just one of the
contributors. It is the cumulative result of 12 years work by a
very large number of people
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Thank You
Thank You
www.embecosm.com
www.opencores.org
Copyright © 2012 Embecosm. Freely available under a Creative Commons license