Holger Pirk

Associate Professor/Senior Lecturer in Computing

👨‍💻 About Me

Email: hlgr@imperial.ac.uk

Office:
Huxley Building Room 431
Imperial College London
180 Queen's Gate
London SW7 2RH, United Kingdom

Research Interests:

I am interested in all things data: analytics, transactions, systems, algorithms, data structures, processing models and everything in between. While some of my work targets "traditional" relational databases, my objective is to broaden the applicability of data management techniques. This naturally leads to research at the intersection of data management, compilers and computer architecture: I study the effective use of current and emerging hardware to improve the performance of data-intensive applications and abstractions to make them easier to program. This means targeting new applications like visualization, games, IoT and AI as well as new platforms like compilers, GPUs or FPGAs as well all hardware-conscious algorithms, new data processing paradigms, algebraic optimizations, cost models and code generation techniques.

Bio:

Before joining Imperial, I was a Postdoc at the Database group at MIT CSAIL. I spent my PhD years in the Database Architectures group at CWI in Amsterdam resulting in a PhD from the University of Amsterdam in 2015. I received my master's degree (Diplom) in computer science at Humboldt-Universität zu Berlin in 2010.

🛠️ Projects

I work on many data management problems but usually focus on the systems-side. The vehicle for my research is an umbrella-project called BOSS. BOSS is a next-generation data management system that supports a variety of applications and provides many features we think a modern data management system should have.

📚 Publications

📄 BOSS - An architecture for database kernel composition

VLDB 2024

H Mohr-Daurat, X Sun, H Pirk

📄 Wisent: an in-memory serialization format for leafy trees

Joint Workshops at 49th International Conference on Very Large Data Bases (VLDBW’23) — Second International Workshop on Composable Data Management Systems (CDMS’23)

H Mohr-Daurat, H Pirk

📄 Collaborative Data Science using Scalable Homoiconicity

SIGMOD RECORD

H Pirk

📄 Homoiconicity For End-to-end Machine Learning with BOSS.

H Mohr-Daurat, H Pirk

📄 SCABBARD: Single-Node Fault-Tolerant Stream Processing

48th International Conference on Very Large Data Bases (VLDB)

G Theodorakis, F Kounelis, P Pietzuch, H Pirk

📄 High-Performance Tree Indices: Locality matters more than one would think.

T Kowalski, F Kounelis, H Pirk

📄 LightSaber: Efficient Window Aggregation on Multi-core Processors

ACM SIGMOD International Conference on Management of Data (SIGMOD)

G Theodorakis, A Koliousis, P Pietzuch, H Pirk

📄 SlideSide: a fast incremental stream processing algorithm for multiple queries

23rd International Conference on Extending Database Technology (EDBT)

G Theodorakis, P Pietzuch, H Pirk

📄 Accelerating the merge phase of sort-merge join

29th International Conference on Field-Programmable Logic and Applications (FPL)

P Papaphilippou, H Pirk, W Luk

📄 Thriving in the No Man's Land between compilers and databases

Conference on Innovative Data Systems Research

H Pirk, J Giceva, P Pietzuch

📄 Efficient Top-K Query Processing on Massively Parallel Hardware

44th ACM SIGMOD International Conference on Management of Data

A Shanbhag, H Pirk, S Madden

📄 Evaluating end-to-end optimization for data analytics applications in weld

Proceedings of the VLDB Endowment

S Palkar, J Thomas, D Narayanan, P Thaker, R Palamuttam, P Negi, A Shanbhag, M Schwarzkopf, H Pirk, S Amarasinghe, S Madden, M Zaharia

📄 Hammer Slide: Work- and CPU-efficient Streaming Window Aggregation

G Theodorakis, A Koliousis, PR Pietzuch, H Pirk

📄 Locality-Adaptive Parallel Hash Joins Using Hardware Transactional Memory

7th International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures (ADMS) / International Workshop on In-Memory Data Management (IMDM)

A Shanbhag, H Pirk, S Madden

📄 Weld: Rethinking the interface between data-intensive applications

S Palkar, J Thomas, D Narayanan, A Shanbhag, R Palamuttam, H Pirk, M Schwarzkopf, S Amarasinghe, S Madden, M Zaharia

📄 Weld: A Common Runtime for High Performance Data Analytics

CIDR

S Palkar, J Thomas, A Shanbhag, H Pirk, M Schwarzkopf, S Amarasinghe, M Zaharia

📄 Non-Invasive Progressive Optimization for In-Memory Databases

PROCEEDINGS OF THE VLDB ENDOWMENT

S Zeuch, H Pirk, J-C Freytag

📄 Voodoo - a vector algebra for portable database performance on modern hardware

Proceedings of the VLDB Endowment

H Pirk, O Moll, M Zaharia, S Madden

📄 What Makes a Good Physical plan? - Experiencing Hardware-Conscious Query Optimization with Candomble

ACM SIGMOD International Conference on Management of Data

H Pirk, O Moll, S Madden

📄 ...like Commanding an Anthill: A Case for Micro-Distributed (Data) Management Systems

SIGMOD RECORD

H Pirk

📄 By their fruits shall ye know them: A Data Analyst's Perspective on Massively Parallel System Design.

H Pirk, S Madden, M Stonebraker

📄 The DBMS - your Big Data Sommelier

31st IEEE International Conference on Data Engineering

Y Kargin, M Kersten, S Manegold, H Pirk

📄 Database cracking: fancy scan, not poor man's sort!

H Pirk, E Petraki, S Idreos, S Manegold, ML Kersten

📄 Waste Not ... Efficient Co-Processing of Relational Data

IEEE 30th International Conference on Data Engineering (ICDE)

H Pirk, S Manegold, M Kersten

📄 CPU and Cache Efficient Management of Memory-Resident Databases

29th IEEE International Conference on Data Engineering (ICDE)

H Pirk, F Funke, M Grund, T Neumann, U Leser, S Manegold, A Kemper, M Kersten

📄 Hardware-oblivious parallelism for in-memory column-stores

Proceedings of the VLDB Endowment

M Heimel, M Saecker, H Pirk, S Manegold, V Markl

📄 Instant-On Scientific Data Warehouses Lazy ETL for Data-Intensive Research

6th BIRTE International Workshop Held at the 38th International Conference on Very Large Databases (VLDB)

Y Kargin, H Pirk, M Ivanova, S Manegold, M Kersten

📄 Building Virtual Earth Observatories Using Ontologies and Linked Geospatial Data.

M Koubarakis, M Karpathiotakis, K Kyzirakos, C Nikolaou, S Vassos, G Garbis, M Sioutis, K Bereta, S Manegold, ML Kersten, M Ivanova, H Pirk, Y Zhang, C Kontoes, I Papoutsis, T Herekakis, D Michail, M Datcu, G Schwarz, CO Dumitru, D Espinoza-Molina, K Molch, UD Giammatteo, M Sagona, S Perelli, E Klien, T Reitz, R Gregor

📄 Scalable Generation of Synthetic GPS Traces with Real-Life Data Characteristics.

K Bösche, T Sellam, H Pirk, R Beier, P Mieth, S Manegold

📄 X-device query processing by bitwise distribution.

H Pirk, T Sellam, S Manegold, ML Kersten

📄 Accelerating Foreign-Key Joins using Asymmetric Memory Channels.

H Pirk, S Manegold, ML Kersten

📄 Werkzeuggestützte interaktive Formalisierung textueller Anwendungsfallbeschreibungen für den Systemtest.

M Friske, H Pirk

🧹 Academic Service

  • ICDE Demonstration Chair 2022
  • SIGMOD Reproducibility Co-Chair 2022
  • General Chair of BICOD 2021 (Co-Chaired with Thomas Heinis)
  • Area Editor for Information Systems, 2020-today
  • Associate Chair for ICDE 2022
  • Webchair of SIGMOD 2019
  • Core Member of SIGMOD Program Committee 2019
  • Member of VLDB Program Committee 2016, 2017, 2018, 2020, 2021
  • Member of SIGMOD Program Committee 2016, 2017, 2018, 2020, 2021
  • Member of ICDCS Program Committee 2018
  • Member of ICDE Program Committee 2016, 2020
  • Member of EDBT Program Committee 2020
  • Member of ICDE Industrial Track Program Committee 2018
  • Member of SIGMOD/DaMoN Program Committee 2015 & 2018
  • Member of VLDB PhD Workshop Program Committee 2016
  • Member of VLDB Demo Program Committee 2016
  • Member of ICDE PhD Workshop Program Committee 2017
  • Member of ICDE Demo Program Committee 2016

💬 Invited Talks

  • High-performance multi-paradigm database systems, Huawei Science & Technology Day, Edinburgh, 2021
  • Dark Silicon–A currency we do not control, KTH, Stockholm, 2020
  • Dark Silicon–A currency we do not control, Invited Fresh Thinking Keynote at SIGMOD DaMoN Workshop, 2019
  • Invited Participation, Microsoft Faculty Research Summit, 2018
  • Hardware-Conscious Data Processing Systems, Universität des Saarlands, 2018
  • Hardware-Conscious Data Processing Systems, Technische Universität Dresden, 2018
  • Hardware-Conscious Data Processing Systems, Technische Universität Dortmund, 2018
  • Hardware-Conscious Data Processing Systems, Universität Tübingen, 2018
  • Hardware-Conscious Data Processing Systems, University of Washington, 2018
  • Hardware-Conscious Data Processing Systems, Oxford University, 2018
  • Hardware-Conscious Data Processing Systems, SAP HANA Tech Days, 2018
  • Voodoo - A Kernel For Database Performance Engineering, Harvard University, 2016
  • Voodoo - A Kernel For Database Performance Engineering, Yale University, 2016
  • Voodoo - A Kernel For Database Performance Engineering, Brown University, 2015
  • A mind like water - Increasing DBMS Resilience without Sacrificing Performance, Imperial College London, 2015
  • A mind like water - Increasing DBMS Resilience without Sacrificing Performance, EPFL, 2015
  • Waste Not, Want Not - Efficient Co-Processing of Relational Data, ETH Zürich, 2014
  • Waste Not, Want Not - Efficient Co-Processing of Relational Data, Oracle Labs, 2013
  • Waste Not, Want Not - Efficient Co-Processing of Relational Data, IBM Almaden, 2013
  • Hardware-Conscious Cost Modelling through the Ages, ETH Zürich, 2013
  • Cache Conscious Data Layouting for In-Memory Databases, Humboldt Universität zu Berlin, 2012
  • Cache Conscious Data Layouting for In-Memory Databases, Techniche Universität München, 2012

🧪 Patents

  • Marion Behnen, Richard Cole, Qi Jin, Timo Pfahl and Holger Pirk: Data Analysis using Facet Attributes, US Patent, 2011
  • Marion Behnen, Qi Jin, Timo Pfahl and Holger Pirk: Cube Faceted Data Analysis, US Patent, 2010

🏛️ Supervision

PhD Students

  • David Loughlin: High-Performance Lake-Data Management – ongoing
  • Hubert Mohr-Daurat: Homoiconic Data as a Basis for Data Cleaning – ongoing
  • Fotios Kounelis: Transparent Compression in General Purpose Programming Languages – ongoing
  • Ahmad Khazaie: Index Structures for Worst-case Optimal Joins – ongoing
  • Giorgos Theodorakis: High-performance Stream Processing (jointly with Peter Pietzuch) – ongoing

Postdocs

  • Andrea Piermarteri: Data Visualization using a Homoiconic Data Representation
  • Xuan Sun: Data Compression in Composable DBMSs

Master's Students

  • Hannes Hertach, MEng Computing: Query Compiler for a Symbolic Database Management System
  • Tiger Wang, MEng Computing: Homoiconic Symbolically Distributed Processing
  • Abel Shields, MEng Computing: Evaluating Symbolic Programs on GPUs
  • Alexandru-Petre Cazan, MEng Computing: Symbolic Optimization of Database Queries
  • Christopher Battarbee, MEng Computing: Profile-Guided Optimization using Database techniques
  • Liam Pilot, MEng Computing: Accelerating Stream Processing with RDMA
  • Mayank Surana, EE: BW-Trees on FPGAs and GPUs
  • Marek Beseda, MEng Computing: NUMA-Aware Stream Processing
  • William Woodacre, MEng: Deterministic concurrency control for transaction processing systems on FPGAs
  • Emma Gospodinova, MEng: BW-Trees on FPGAs
  • Zicong Ma, JMC: GamesBench: A Benchmark for Streaming Analytics of Strategy Games
  • Oliver Brown, MEng Computing: Powerpipes
  • Charith Amarasinghe, MSc: CLOPS: A Proxy Testbed for Cloud Storage
  • Celie Valentiny, MSc: Personal Tracking Data Recommender/Awareness Demonstration for Imperial Festival
  • Yao Chen MSc.: Predicting access latencies of modern storage devices
  • Jeng Wong, MSc: Massively Parallel Stream Ingestion
  • Andrew Chow, MSc: Personal Tracking Data Recommender/Awareness Demonstration for Imperial Festival
  • Armand Cadet, MSc: Personal Tracking Data Recommender/Awareness Demonstration for Imperial Festival

Master's-Level Group Projects

  • Alexander Harkness et al., MEng Group Project: Developing a collaborative drawing app using CRDTs
  • Kapilan Cholanet al., MEng Group Project: Developing a collaborative drawing app using CRDTs
  • Jordan Spooner et al., MEng Group Project: Building an Efficient Query Processor by Generating OpenCL Code from Voodoo Vector Algebra

Bachelor's Students

  • Robert Moore, BEng Computing: Adaptive Compression for Graph Processing
  • Ki Cheuk, BEng: Developing a hardware-conscious cost model for parallel, data-intensive applications

Ph.D. Assessment

  • Timo Kersten, TU Muenchen, 2021
  • Matthew Pugh, University of Edinburgh, 2021 (thesis currently in corrections phase)
  • Christian Priebe, Imperial College London, 2020

🪐 Teaching

  • Nomination for Student Choice Award for Outstanding Teaching 2018, 2019 & 2020
  • Advanced Databases at Imperial College, 2017 - 2022
  • Performance Engineering at Imperial College, 2018 - 2022
  • Introduction to Object-Oriented Programming at Imperial College, 2018 - 2021
  • Contributed lectures to Database Systems at MIT, 2015 & 2016
  • Teaching assistant for Advanced Software Engineering at HPI, 2009
  • Software Engineering Best Practices at Humboldt-University, 2008

🏫 University Service

  • Coordinator for the departmental colloquium at the Department of Computing
  • Co-chair of the Athena SWAN Committee (Imperial Computing currently holds Athena SWAN Bronze status and working to upgrade to Silver)
  • Member of undergrad admission panel (until 2021)
  • Departmental Knowledge Management Officer (since 2021)

🎓 Education

Ph.D. at the Database Architectures group at CWI/University of Amsterdam (2011 - 2014)

Thesis: Waste Not, Want Not! Managing Relational Data In Asymmetric Memories

· Advisors: Martin Kersten & Stefan Manegold

MSc. (Diplom) of computer science and psychology at Humboldt-Universität zu Berlin (2003 - 2010)

Thesis: Cache Conscious Data Layouting For In-Memory Databases

· Advisor: Ulf Leser

🏢 Professional Experience

Senior Lecturer at Imperial College, London, UK

06/2021 - today

Lecturer at Imperial College, London, UK

09/2017 - 05/2021

Consulting Researcher at Microsoft Research, Cambridge, UK

02/2018 - 09/2019

Visiting Researcher at the DMX group at Microsoft Research, Redmond, USA

05/2016 - 08/2017

Postdoctoral researcher at the Database group at MIT, Cambridge, USA

12/2014 - 04/2017

Ph.D. candidate at the Database Architectures group at CWI, Amsterdam

10/2010 - 09/2014

Software architect at Kontacts IT-Solutions GmbH, Potsdam, Germany

10/2009 - 09/2010

Teaching assistant at the Hasso-Plattner-Institute, Potsdam, Germany

10/2009 - 02/2010

Research student at the EPIC group at the Hasso-Plattner-Institute, Potsdam, Germany

01/2009 - 09/2009

Research assistant at the EPIC group at the Hasso-Plattner-Institute, Potsdam, Germany

03/2008 - 08/2008

Research assistant at the Knowledge Management in Bio-Informatics group at Humboldt Universität zu Berlin 12/2007 - 10/2008

Research assistant (remote) for the IBM Data Warehousing group (Berlin, Germany)

12/2006 - 05/2007

Research Intern at IBM Silicon Valley Labs, San Jose, CA

04/2006 - 10/2006

Research assistant at the Fraunhofer FIRST, Berlin

10/2004 - 03/2006

💷 Awards & Funding

  • 2021, Principal Investigator, EPSRC New Investigator Award "Bespoke Compression for General-Purpose Programming Languages". Duration: 2 years
  • 2020, Co-Investigator, Innovate UK Project "Energy Catalyst 7". Duration: 2 years
  • 2020, Co-Investigator, Innovate UK Project "HyAI - Hydrogen AI". Duration: 1 years
  • 2019, Principal Investigator, Oracle Labs-funded Project "Compression in GraalVM/Truffle". Duration: 3.5 years
  • 2018, Hardware grant from Intel/Altera
  • 2017, Hardware grant from NVidia