Explore parallelism in system level models by assessing PDES performance (2018-2020, UCI)
The emergence of data-intensive applications, such as Deep Neural Networks (DNN) demands early attention to finding parallelism opportunities for effective embedded system design and modeling. Electronic System-Level (ESL) design using SystemC Transaction Level Modeling (TLM) enables explicit modeling of parallelism early in the design flow. However, the choice of synchronization and communication mechanisms between concurrent modules has a significant impact on the available parallelism in the TLM models.
In this project, we propose and analyze a set of non-invasive standard-compliant modeling techniques to increase parallelism in IEEE SystemC TLM-1 and TLM-2.0 models. As shown in left figure below, we illustrate simulator parallelism, model parallelism and simulation speedup in a 3-dimensional space. As the red arrow indicates, both higher model parallelism and simulator parallelism achieve the maximum simulation speedup. Moreover, by increasing model parallelism opportunities in one dimension, the simulator can better leverage its parallelization capabilities for the maximum simulation speedup.
To demonstrate the effectiveness of our approach, we measure the performance of aggressive out-of-order parallel discrete event simulation (PDES) in the Recoding Infrastructure for SystemC (RISC) and analyze the parallelism in the models. In particular, we demonstrate the impact of varying synchronization mechanisms on the exposed parallelism using six modeling styles of a state-of-art DNN, GoogLeNet (middle figure). We further have quantified the improved parallelism in the improved SystemC TLM-1 and TLM-2.0 models by measuring the performance of aggressive out-of-order parallel simulation in RISC. The right figure shows top-level test benches for TLM-1 (a) and TLM-2.0 (b) that stimulus and monitor modules are connected using FIFO in case of TLM-1 and sockets to shared memory inside DUT in case of TLM-2.0.
The results support four hypothese (H1) less restrictive transaction types enable higher parallelism, (H2) abstract TLM-1 models carry less workload than memory accurate TLM-2.0 model (H3) higher speed in aggressive parallel simulation is a significant indicator of higher level of parallelism in design and (H4) improved RISC scheduling algorithms show higher simulation speedup.
Simulator parallelism, model parallelism
and simulation speedup forms a 3-dimensional space
GoogLeNet network
Top-level TLM tech bench for DNN (a) TLM-1 (b) TLM-2.0
Project Publications:
- E. Arasteh, R. Dömer
An Untimed SystemC Model of GoogLeNet
in "Analysis, Estimations, and Applications of Embedded Systems"
by M. Wehrmeister, M. Kreutz, M. Götz, S. Henkler, A. Pimentel, and A. Rettberg,
reprint of best papers at IESS 2019, Springer, February 2023. (ISBN: 978-3-031-26499-3)
- E. Arasteh, R. Dömer
Improving Parallelism in System Level Models by Assessing PDES Performance
Proceedings of Forum on Specification and Design Languages, Antibes, France, September 2021.
- E. Arasteh, R. Dömer
Systematic Evaluation of Six Models of GoogLeNet using PDES
Center for Embedded and Cyber-Physical Systems, Technical Report 21-03, September 2021.
- Z. Cheng, E. Arasteh, R. Dömer
Event Delivery using Prediction for Faster Parallel SystemC Simulation
Asia and South Pacific Design Automation Conference, Beijing, China, January 2020.
- D. Mendoza, Z. Cheng, E. Arasteh, R. Dömer
Lazy Event Prediction using Defining Trees and Schedule Bypass for Out-of-Order PDES
Design, Automation and Test in Europe (DATE) Conference, Grenoble, France, March 2020.
- R. Dömer, Z. Cheng, D. Mendoza, E. Arasteh
Pushing the Limits of Parallel Discrete Event Simulation for SystemC
A Journey of Embedded and Cyber-Physical Systems, Springer Nature, Switzerland, August 2020.
- E. Arasteh, R. Dömer
Untimed SystemC Model of GoogLeNet
Proceedings of the International Embedded Systems Symposium, Springer, Friedrichshafen, Germany, September 2019.
Controller architecture of enterprise solid state drive (Summer 2021, Samsung)
Design SoC and firmware architecture of enterprise SSD controller.
Design and evaluate resource allocation strategies to guarantee SoC performance in a multi-tenant cloud.
Design system-level contention models of a modern flash storage device.
ASIC design of speech recognition processor (Summer 2019, Syntiant)
High-level synthesis (HLS) design of cryptography block.
Evaluate performance of HLS RTL implementation to optimize speed and area.
Design system-level test bench of the next generation Neural Decision Processor (NDP).
ASIC design and verification of network video camera chip (2015-2018, Canon Inc. - Axis Communications)
Design virtual prototype for
ARTPEC-7 .
Develop infrastructure for block, subsystem, SoC verification using SystemVerilog and UVM.
Formal verification to measure performance.
DSP software design for multi-standard 2G/3G/4G cellular modem (2012-2014, Ericsson Modem)
Design real-time OS and device driver for Embedded Vector Processor (EVP)
Develop low-level software to verify subsystems inside digital baseband including DSP, security and power management.
Troubleshoot system errors raised by either EVP software or hardware.
Software design for airbag bike helmet (2011-2012, Hövding)
Implement a secure in-field and in-production firmware upgrade to protect
Hövding proprietary algorithm.
Design secure boot and customized bootloader.
ASIC design and verification of Video Display Controller (VDC), (2010-2011, ARM)
Design and synthesis of AMBA AXI VDC supporting paged memory with high average and peak latencies in 65nm CMOS process
Perfrom requirement analysis, RTL development, functional verification, synthesis for 65nm CMOS process and FPGA validation in hardware.
Develop Linux device driver, C-based user interface and MATLAB test scripts in software.
Project Publication:
|