Dr Michael Shekelyan

Lecturer in Computer Science (T&R)

Email: m.shekelyan@qmul.ac.uk
Room Number: Peter Landin Building, 4th Floor
Website: https://shekelyan.science
Office Hours: Email me to arrange an appointment.

Profile
Teaching
Research
Publications

Profile

I am a Lecturer (Assistant Professor) in Computer Science. I am part of the theory group and aim to develop practical algorithms & data structures that can be used in a wide range of applications or theoretical foundations that promise substantial practical impact. I participate in many research communities that are interested in learning from data. Most of my works have been presented in the go-to data management conferences (VLDB, ACM SIGMOD/PODS, IEEE ICDE, EDBT), but I also regularly serve as a reviewer for leading machine learning and data mining conferences (NeurIPS, ICML, ICLR, AISTATS, ACM KDD) as well as many journals. My main goal is to progress technology towards reliably and responsibly capturing crucial (factual) insights from data, i.e., reactive interfaces for data exploration (better algorithms, data structures & data approximations), and privacy-preserving analysis & machine learning (better theory, hypothesis tests & randomised algorithms).

Open Position (PhD Studentship)

Privacy-Preserving Algorithms: Unlocking Data Sharing for Medical Sciences & Machine Learning

PhD studentship: covers 3 years stipend (currently £20,662 per year in 2023/2024) and fees
Supervisor: Dr Michael Shekelyan
Location: London, UK
Applications: open to UK home students (see TGC 5.2 Student eligibility) till 31st January 2024
- More details and how to apply
- Expected start date: September 2024
Description:

Statistical analysis & machine learning approaches can only be as good as the underlying data and harder tasks call for a larger collection of accurate data. Collectively, health organisations, companies and service providers collect massive amounts of data from patients, employees and users, but on an individual level, a single hospital, company or service provider only accesses their smaller subset. Data sharing partnerships are challenging due to distrust, varying quality standards, legal obligations and ethical considerations aimed to protect the privacy of the person whose confidential information is collected. Naive anonymisation techniques may remove names, dates and identifiers, but overlook quasi-identifiers leveraging contextual knowledge, i.e., rare combinations of facts pointing to a particular person when considering additional information. Sophisticated frameworks like differential privacy & federated learning employ randomised algorithms that satisfy formal privacy guarantees & privacy-enhancing technologies that safely aggregate data in a decentralised fashion using cryptographic techniques. The studentship can take many directions such as mathematical foundations for privacy-preserving algorithms, intuitively explainable formal privacy guarantees, federated learning systems satisfying legal requirements, and other practical or theoretical solutions towards privacy-preserving data sharing. Contact m.shekelyan@qmul.ac.uk for further information or questions about potential directions.

Undergraduate Teaching

Semester A

Lecturer for ECS519U Database Systems (Mile End campus)

Semester B

Module Organiser for ECS518U Operating Systems (Mile End campus)
Module Organiser for IOT518U Operating Systems (London City Institute of Technology)

Further Material for Operating Systems

Accompanying Quizzes and Links: https://fork.science
Recommended textbook: http://ostep.org/ (goes beyond lecture)

Further Reading for Database Systems

Introduction

^{[Wikipedia articles: Tabulating machine, Flat-file database, Comma-separated values, Database, Outline of databases.]}

Hollerith, H. (1889). U.S. Patent No. 395,781. Washington, DC: U.S. Patent and Trademark Office. https://www.census.gov/history/pdf/hollerith_patent_01081889.pdf

Idreos, S., Alagiannis, I., Johnson, R., & Ailamaki, A. (2011). Here are my data files. here are my queries. where are my results?. In Proceedings of 5th Biennial Conference on Innovative Data Systems Research. https://stratos.seas.harvard.edu/sites/scholar.harvard.edu/files/cidr2011.pdf

Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom. 2001. Database Systems: The Complete Book. Prentice Hall PTR, USA. Chapter “The Worlds of Database Systems”. http://infolab.stanford.edu/~ullman/fcdb/ch1.pdf

Abiteboul, S., Hull, R., & Vianu, V. (1995). Foundations of databases (Vol. 8). Reading: Addison-Wesley. Chapter 1 - Database Systems. http://webdam.inria.fr/Alice/pdfs/Chapter-1.pdf

Connolly, T. M., & Begg, C. E. (2005). Database systems: a practical approach to design, implementation, and management. Pearson Education. Chapter “Introduction to Databases”.

Navigational (IDS) & Hierarchical (IBM IMS) Database Models (1960s)

^{[Wikipedia articles: Navigational database, Integrated Data Store, Hierarchical database model, IBM information Management System.]}

Bachman, C. W. (1969). Data structure diagrams. ACM SIGMIS Database: The DATABASE for Advances in Information Systems, 1(2), 4-10. https://dl.acm.org/doi/pdf/10.1145/1017466.1017467

Bachman, C. W. The origin of the integrated data store (IDS): The first direct-access DBMS. IEEE Annals of the History of Computing, 31(4), 42-54. https://tschwarz.mscs.mu.edu/Classes/DB23/HW/bachmanIDS.pdf

IBM. Introduction to IMS - History of IMS: Beginnings at NASA. https://www.ibm.com/docs/en/zos-basic-skills?topic=now-history-ims-beginnings-nasa

Connolly, T. M., & Begg, C. E. (2005). Database systems: a practical approach to design, implementation, and management. Pearson Education. Chapter “Introduction to Databases” - Section “History of Database Management Systems”.

Entity-Relationship (ER) Model

^{[Wikipedia articles: Entity–relationship model, Database design.]}

Chen, P. P. S. (1976). The entity-relationship model—toward a unified view of data. ACM transactions on database systems (TODS), 1(1), 9-36. https://dl.acm.org/doi/pdf/10.1145/320434.320440

Chen, P. P. S. (1997). English, Chinese and ER diagrams. Data & Knowledge Engineering, 23(1), 5-16. https://bit.csc.lsu.edu/~chen/pdf/ER_C.pdf

Chen, P. (2002). Entity-relationship modeling: historical events, future trends, and lessons learned. Software pioneers: contributions to software engineering, 296-310 https://bit.csc.lsu.edu/~chen/pdf/Chen_Pioneers.pdf.

Connolly, T. M., & Begg, C. E. (2005). Database systems: a practical approach to design, implementation, and management. Pearson Education. Chapter “Entity-Relationship Modeling” & Appendix “Alternative ER Modeling Notations”.

Relational Model, Relational Algebra & Relational Schema

^{[Wikipedia articles: Relational model, Candidate key, Foreign key, Relational algebra, Database schema.]}

Codd, E. F. (1970). A relational model of data for large shared data banks. Communications of the ACM, 13(6), 377-387. https://dl.acm.org/doi/pdf/10.1145/362384.362685

Astrahan, M. M., Blasgen, M. W., Chamberlin, D. D., Eswaran, K. P., Gray, J. N., Griffiths, P. P., ... & Watson, V. (1976). System R: Relational approach to database management. ACM Transactions on Database Systems (TODS), 1(2), 97-137. https://dl.acm.org/doi/pdf/10.1145/320455.320457

Codd, E. F. (1990). The relational model for database management: version 2. Addison-Wesley Longman Publishing Co., Inc.. https://dl.acm.org/doi/pdf/10.5555/77708

Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom. 2001. Database Systems: The Complete Book. Prentice Hall PTR, USA. Chapter “Relational Database Modeling”. http://infolab.stanford.edu/~ullman/fcdb/ch2.pdf

Abiteboul, S., Hull, R., & Vianu, V. (1995). Foundations of databases (Vol. 8). Reading: Addison-Wesley. Chapter 3 - The Relational Model http://webdam.inria.fr/Alice/pdfs/Chapter-3.pdf

Connolly, T. M., & Begg, C. E. (2005). Database systems: a practical approach to design, implementation, and management. Pearson Education. Chapters “Relational Model” and following.

Structured Query Language (SQL)

^{[Wikipedia articles: SQL, SQL:2023, Category:SQL, Join (SQL).]}

Chamberlin, D. D., & Boyce, R. F. (1974, May). SEQUEL: A structured English query language. In Proceedings of the 1974 ACM SIGFIDET (now SIGMOD) workshop on Data description, access and control (pp. 249-264). https://dl.acm.org/doi/pdf/10.1145/800296.811515

Oracle (2023). Oracle® Database - SQL Language Reference. https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf

Abiteboul, S., Hull, R., & Vianu, V. (1995). Foundations of databases (Vol. 8). Reading: Addison-Wesley. Chapter 7 - Notes on Practical Languages http://webdam.inria.fr/Alice/pdfs/Chapter-7.pdf

Guagliardo, P., & Libkin, L. (2017). A formal semantics of SQL queries, its validation, and applications. Proceedings of the VLDB Endowment, 11(1), 27-39. https://www.vldb.org/pvldb/vol11/p27-guagliardo.pdf

Connolly, T. M., & Begg, C. E. (2005). Database systems: a practical approach to design, implementation, and management. Pearson Education. Chapters “SQL: Data Manipulation” and following.

Database Normalisation

^{[Wikipedia articles: Database normalization, First normal form, Second normal form Third normal form.]}

Kent, W. (1983). A simple guide to five normal forms in relational database theory. Communications of the ACM, 26(2), 120-125. https://dl.acm.org/doi/pdf/10.1145/358024.358054

Newcomer, L. (2005). Normalization for Normal people Understanding Algorithms for Getting to 3NF. ASEE CoED Journal Volume 15 / Number 1. https://coed.asee.org/wp-content/uploads/2020/09/6-Normalization-for-Normal-people-Understanding-Algorithms-for-Getting-to-3NF.pdf

Abiteboul, S., Hull, R., & Vianu, V. (1995). Foundations of databases (Vol. 8). Reading: Addison-Wesley. Chapter 11 - Design and Dependencies. http://webdam.inria.fr/Alice/pdfs/Chapter-11.pdf

Connolly, T. M., & Begg, C. E. (2005). Database systems: a practical approach to design, implementation, and management. Pearson Education. Chapters “Normalisation” & “Advanced Normalisation”.

Theory behind Normalisation: Lossless Join Decomposition & Chase Algorithm

^{[Wikipedia articles: Lossless Join Decomposition, Chase (algorithm).]}

Aho, A. V., Beeri, C., & Ullman, J. D. (1979). The theory of joins in relational databases. ACM Transactions on Database Systems (TODS), 4(3), 297-314. https://dl.acm.org/doi/pdf/10.1145/320083.320091

Maier, D., Mendelzon, A. O., & Sagiv, Y. (1979). Testing implications of data dependencies. ACM Transactions on Database Systems (TODS), 4(4), 455-469. https://dl.acm.org/doi/pdf/10.1145/320107.320115

Abiteboul, S., Hull, R., & Vianu, V. (1995). Foundations of databases (Vol. 8). Reading: Addison-Wesley. Chapter 11 - Design and Dependencies. http://webdam.inria.fr/Alice/pdfs/Chapter-11.pdf

Connolly, T. M., & Begg, C. E. (2005). Database systems: a practical approach to design, implementation, and management. Pearson Education. Chapter “Normalization” - Section “Modification Anomalies”.

Enhanced Entity-Relationship (E-ER) Model & Unified Modeling Language (UML)

^{[Wikipedia articles: Enhanced Entity–relationship model, Unified Modeling Language.]}

Markowitz, V. M., & Shoshani, A. (1992). Representing extended entity-relationship structures in relational databases: A modular approach. ACM Transactions on Database Systems (TODS), 17(3), 423-464. https://dl.acm.org/doi/pdf/10.1145/132271.132273

Saiedian, H. (1997). An evaluation of extended entity-relationship model. Information and software Technology, 39(7), 449-462. https://people.eecs.ku.edu/~saiedian/Pub/Journal/1997-Saiedian-IST-ER.pdf

Marcos, E., Vela, B., & Cavero, J. M. (2003). A methodological approach for object-relational database design using UML. Software and Systems Modeling, 2, 59-72. https://link.springer.com/content/pdf/10.1007/s10270-002-0001-y.pdf

Connolly, T. M., & Begg, C. E. (2005). Database systems: a practical approach to design, implementation, and management. Pearson Education. Chapter “Enhanced Entity-Relationship Model”.

Object-Relational Database Management Systems (ORDBMS)

^{[Wikipedia articles: DB-Engines ranking, Object-Relational database, Comparison of object-relational database management systems.]}

Akhtar, A. (2023). Popularity Ranking of Database Management Systems. arXiv preprint arXiv:2301.00847. https://arxiv.org/pdf/2301.00847.pdf

Oracle (2023). Oracle® Database - Object-Relational Developer's Guide. https://docs.oracle.com/en/database/oracle/oracle-database/21/adobj.

Oracle (2023). Oracle® Database - Database Concepts. https://docs.oracle.com/en/database/oracle/oracle-database/21/cncpt

Connolly, T. M., & Begg, C. E. (2005). Database systems: a practical approach to design, implementation, and management. Pearson Education. Chapter “Object-Relational DBMSs”.

Transaction Processing & Benchmarks

^{[Wikipedia articles: ACID, Database transaction, Transaction processing, Two-phase locking, Transaction Processing Performance Council.]}

Haerder, T., & Reuter, A. (1983). Principles of transaction-oriented database recovery. ACM computing surveys (CSUR), 15(4), 287-317. https://dl.acm.org/doi/pdf/10.1145/289.291

Gray, J. (1981, September). The transaction concept: Virtues and limitations. In VLDB (Vol. 81, pp. 144-154). https://people.eecs.berkeley.edu/~kubitron/courses/cs262a-F21/handouts/papers/theTransactionConcept.pdf

Barthels, C., Müller, I., Taranov, K., Alonso, G., & Hoefler, T. (2019). Strong consistency is not hard to get: Two-Phase Locking and Two-Phase Commit on Thousands of Cores. Proceedings of the VLDB Endowment, 12(13), 2325-2338. https://www.vldb.org/pvldb/vol12/p2325-barthels.pdf

Boncz, P., Neumann, T., & Erling, O. (2013, August). TPC-H analyzed: Hidden messages and lessons learned from an influential benchmark. In Technology Conference on Performance Evaluation and Benchmarking (pp. 61-76). Cham: Springer International Publishing. https://homepages.cwi.nl/~boncz/snb-challenge/chokepoints-tpctc.pdf

Connolly, T. M., & Begg, C. E. (2005). Database systems: a practical approach to design, implementation, and management. Pearson Education. Chapter “Transaction Management”.

NoSQL (Key-Value Store, Graph Database etc) & Semistructured (XML, JSON)

^{[Wikipedia articles: NoSQL, Neo4j, Bigtable, Semi-structured Data.]}

Stonebraker, M. (2009). The 'No SQL' Discussion Has Nothing to Do With SQL. In BLOG@CACM https://cacm.acm.org/blogs/blog-cacm/50678-the-no-sql-discussion-has-nothing-to-do-with-sql

Strauch, C., Sites, U. L. S., & Kriha, W. (2011). NoSQL databases. Lecture Notes, Stuttgart Media University, 20(24), 79. https://www.christof-strauch.de/nosqldbs.pdf

Lissandrini, M., Brugnara, M., & Velegrakis, Y. (2018). Beyond macrobenchmarks: microbenchmark-based graph database evaluation. Proceedings of the VLDB Endowment, 12(4), 390-403. https://www.vldb.org/pvldb/vol12/p390-lissandrini.pdf

Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., ... & Gruber, R. E. (2008). Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), 26(2), 1-26.https://research.google/pubs/pub27898/

Connolly, T. M., & Begg, C. E. (2005). Database systems: a practical approach to design, implementation, and management. Pearson Education. Chapter “Semistructured Data and XML”.

Distributed Systems & Basically Available Soft-state Eventually consistent (BASE)

^{[Wikipedia articles: CAP theorem, Eventual consistency.]}

Brewer, E. (2017). Spanner, truetime and the cap theorem. https://storage.googleapis.com/pub-tools-public-publication-data/pdf/45855.pdf

Vogels, W. (2009). Eventually consistent. Communications of the ACM, 52(1), 40-44. https://dl.acm.org/doi/pdf/10.1145/1435417.1435432

Connolly, T. M., & Begg, C. E. (2005). Database systems: a practical approach to design, implementation, and management. Pearson Education. Chapter “Replication and Mobile Databases”.

Data Warehousing, Online Analytical Processing (OLAP), Main Memory Database Systems

^{[Wikipedia articles: Data warehouse, Online analytical processing, In-memory database, Column-oriented DBMS.]}

Garcia-Molina, H., & Salem, K. (1992). Main memory database systems: An overview. IEEE Transactions on knowledge and data engineering, 4(6), 509-516. https://pages.cs.wisc.edu/~jhuang/qual/main-memory-db-overview.pdf

Kemper, A., & Neumann, T. (2011, April). HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In 2011 IEEE 27th International Conference on Data Engineering (pp. 195-206). IEEE. https://cs.brown.edu/courses/cs227/archives/2012/papers/olap/hyper.pdf

Chaudhuri, S., & Dayal, U. (1997). An overview of data warehousing and OLAP technology. ACM Sigmod record, 26(1), 65-74. https://dl.acm.org/doi/pdf/10.1145/248603.248616

Armenatzoglou, N., Basu, S., Bhanoori, N., Cai, M., Chainani, N., Chinta, K., ... & Terry, D. (2022, June). Amazon Redshift re-invented. In Proceedings of the 2022 International Conference on Management of Data (pp. 2205-2217). https://assets.amazon.science/93/e0/a347021a4c6fbbccd5a056580d00/sigmod22-redshift-reinvented.pdf

Hellerstein, J. M., & Stonebraker, M. (Eds.). (2005). Chpter 4: New DBMS Architectures. Readings in database systems. MIT press. http://www.redbook.io/pdf/ch4-newdbms.pdf

Connolly, T. M., & Begg, C. E. (2005). Database systems: a practical approach to design, implementation, and management. Pearson Education. Chapters “Data Warehousing Concept” and following.

Database Administration, Ethics & General Data Protection Regulation (GDPR)

^{[Wikipedia articles: Database administration, Big data ethics, General Data Protection Regulation.]}

Agarwal, A., George, M., Jeyaraj, A., & Schwarzkopf, M. (2021). Retrofitting GDPR compliance onto legacy databases. Proceedings of the VLDB Endowment, 15(4). https://vldb.org/pvldb/vol15/p958-george.pdf

Shastri, S., Banakar, V., Wasserman, M., Kumar, A., & Chidambaram, V. Understanding and Benchmarking the Impact of GDPR on Database Systems. Proceedings of the VLDB Endowment, 13(7). https://www.vldb.org/pvldb/vol13/p1064-shastri.pdf

Stoyanovich, J., Howe, B., & Jagadish, H. V. (2020). Responsible data management. Proceedings of the VLDB Endowment, 13(12). https://www.vldb.org/pvldb/vol13/p3474-asudeh.pdf

Connolly, T. M., & Begg, C. E. (2005). Database systems: a practical approach to design, implementation, and management. Pearson Education. Chapters “Security and Administration” & “Professional, Legal, and Ethical Issues in Data Management”.

Query Processing & Query Optimisation

^{[Wikipedia articles: Hash join, Worst-case optimal join algorithm, Query plan, Query optimization.]}

Barber, R., Lohman, G., Pandis, I., Raman, V., Sidle, R., Attaluri, G., ... & Sharpe, D. (2014). Memory-efficient hash joins. Proceedings of the VLDB Endowment, 8(4), 353-364. https://www.vldb.org/pvldb/vol8/p353-barber.pdf

Ngo, H. Q., Porat, E., Ré, C., & Rudra, A. (2018). Worst-case optimal join algorithms. Journal of the ACM (JACM), 65(3), 1-40.https://dl.acm.org/doi/pdf/10.1145/3180143

Abiteboul, S., Hull, R., & Vianu, V. (1995). Foundations of databases (Vol. 8). Reading: Addison-Wesley. Chapter 6 - Static Analysis and Optimization http://webdam.inria.fr/Alice/pdfs/Chapter-6.pdf

Hellerstein, J. M., Stonebraker, M., & Hamilton, J. (2007). Architecture of a database system. Foundations and Trends® in Databases, 1(2), 141-259. https://perspectives.mvdirona.com/content/binary/ArchitectureOfDatabaseSystem.pdf

Moerkotte, G. (2020). Draft for Book on “Building Query Compilers”, Chapter 2 “Textbook Query Optimization”. https://pi3.informatik.uni-mannheim.de/~moer/querycompiler.pdf

Vu, T. (2019, June). Deep query optimization. In Proceedings of the 2019 International Conference on Management of Data (pp. 1856-1858). https://dl.acm.org/doi/pdf/10.1145/3299869.3300104

Ding, J., Marcus, R., Kipf, A., Nathan, V., Nrusimha, A., Vaidya, K., ... & Kraska, T. (2022). SageDB: An Instance-Optimized Data Analytics System. Proceedings of the VLDB Endowment, 15(13), 4062-4078. https://www.vldb.org/pvldb/vol15/p4062-ding.pdf

Connolly, T. M., & Begg, C. E. (2005). Database systems: a practical approach to design, implementation, and management. Pearson Education. Chapter “Query Processing”.

Database Index Structures

^{[Wikipedia articles: Database index, B+ tree, Sargable, Category:Database index techniques.]}

Comer, D. (1979). Ubiquitous B-tree. ACM Computing Surveys (CSUR), 11(2), 121-137.https://dl.acm.org/doi/pdf/10.1145/356770.356776

Idreos, S., et al. (2018). The periodic table of data structures. IEEE Data Eng. Bull https://stratos.seas.harvard.edu/files/stratos/files/periodictabledatastructures.pdf.

Kraska, T., Beutel, A., Chi, E. H., Dean, J., & Polyzotis, N. (2018, May). The case for learned index structures. In Proceedings of the 2018 international conference on management of data (pp. 489-504).https://dl.acm.org/doi/pdf/10.1145/3183713.3196909

Connolly, T. M., & Begg, C. E. (2005). Database systems: a practical approach to design, implementation, and management. Pearson Education. Chapter “Methodology—Physical Database Design for Relational Databases”.

Turing Awards related to Databases

^{[Wikipedia articles: Turing Award, Navigational database, Relational database, Parallel database, Ingres (database), PostgreSQL.]}

Bachman, C. W. (1973). The programmer as navigator. Communications of the ACM, 16(11), 653-658. https://dl.acm.org/doi/pdf/10.1145/355611.362534

Codd, E. F. (1981). Relational Database: A Practical Foundation for Productivity. Communications of the ACM, 25(2), 109-117. https://dl.acm.org/doi/pdf/10.1145/1283920.1283937

Gray, J. (1999). What next? A dozen information-technology research goals. In ACM Turing award lectures (p. 1998). https://dl.acm.org/doi/pdf/10.1145/1283920.2159561

Stonebraker, M. (2014). Development of INGRES, Legacy of INGRES, Postgres. https://amturing.acm.org/award_winners/stonebraker_1172121.cfm

Research

Research Interests:

My research focuses primarily on algorithms, data structures and summaries to manage very large or sensitive data. The overall goal is to build a full data pipeline that feeds end users with easily interpretable facts which provide novel insights and aid decision making processes. Reducing the data complexity either through sampling or summarisation plays a crucial role to support exploratory interactions with the data that involve a lot of probing, while still providing an intuitive approximation model of the data. Sensitive data motivated privacy-preserving techniques such as differential privacy & federated learning to facilitate data sharing between organisations whilst reducing risks to the privacy of patients, users, customers and employees whose personal information is collected. Even though the big picture in the data is barely affected by a single individual, how to keep privacy leakage at a negligible level when revealing the big picture remains a challenging problem that requires further research.

Research Topics:

Federated Learning

How can one train models on data shared by different organisations without explicitly exchanging training data?

Differential Privacy

How can one share the big picture in the data without exposing the personal information of an individual?

Random Sampling

How can one efficiently and reliably sample a desired probability distribution where that is not a trivial task?

Data Summaries

How can one summarise data in a way such that the summary is guaranteed to accurately show the big picture in the data?

Query Processing

What algorithms and data structures are needed to answer questions about data a lot more efficiently?

Academic Services:

Reviewer on Program Committees

Machine Learning
- Conference on Neural Information Processing Systems (NeurIPS): 2022 (top 11% reviewer), 2023
- International Conference for Machine Learning (ICML): 2022 (top 10% reviewer), 2023
- International Conference on Learning Representations (ICLR): 2024
- International Conference on Artificial Intelligence and Statistics (AISTATS): 2023
Data Management / Data Science / Data Mining
- IEEE International Conference on Data Engineering (ICDE): 2022
- IEEE International Conference on Data Science and Advanced Analytics (DSAA): 2023
- ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD): 2022
(More on my personal website: https://shekelyan.science)

Reviewer on Demo & Posters Program Commitees

Distributed Systems
- ACM International Conference on Distributed and Event-Based Systems (DEBS): 2021
Scientific Data Management
- International Conference on Scientific and Statistical Database Management (SSDBM): 2018

Proceedings Chair

Database Theory
- International Conference on Database Theory (ICDT): 2024

Publications

Differential Privacy

Shekelyan & Loukides
Differentially Private Top-k Selection via Canonical Lipschitz Mechanism
ArXiv (2022) [preprint, project, pdf]

Lipschitz Mechanism: Can we relax the report-noisy-max mechanism (add noise to all item scores and pick item with max resulting score) to a constraint on the noise distribution such that it unifies various differentially private selection mechanisms (exponential mechanism, permute-and-flip etc)?

Canonical Loss Function: How can one efficiently generate a histogram where each bin counts how many subsets of items would require a specific number of users to change their item scores to make that k-subset the top-k?

Canonical Lipschitz Mechanism: How can one efficiently select a k-subset by plugging the canonical loss function into various differentially private selection mechanisms?

Random Sampling

Shekelyan & Cormode
Sequential Random Sampling Revisited: Hidden Shuffle Method
AISTATS (2021) [conference, project, bibtex, pdf]

Hidden Shuffle Sampler: What's the fastest way to get a random subset and can we generate it on-the-fly in sorted/sequential order without storing more than a handful of values?

Shanghooshabad, Kurmanji, Ma, Shekelyan, Almasi & Triantafillou
PGMJoins: Random Join Sampling with Graphical Models
ACM SIGMOD (2021) [conference, bibtex, link]

PGMJoins: How can one sample from a big table that cross-references/joins a few smaller tables without having to create the big table, particularly for tricky types of joins?

Shekelyan, Cormode, Ma, Shanghooshabad & Triantafillou
Streaming Weighted Sampling over Join Queries
EDBT (2023) [conference, preprint, pdf]

How can one sample rows from a big table that is the result of cross-referencing/joining a few smaller tables, if we cannot create the big table, read smaller tables more than twice and fully copy smaller tables when we read them?

Data Summaries

Shekelyan, Dignoes & Gamper
DigitHist: a Histogram-Based Data Summary with Tight Error Bounds
PVLDB (2017) [conference, link, slides, bibtex, pdf]

DigitHist: How can one reduce a big dataset to a small summary that usually preserves the big picture in terms of the minimal/maximal number of points inside rectangular regions?

Shekelyan, Dignoes, Gamper & Garofalakis
Approximating Multidimensional Range Counts with Maximum Error Guarantees
IEEE ICDE (2021) [conference, bibtex, pdf]

SliceHist: How can one reduce a big low-dimensional dataset to a small summary that is guaranteed to preserves the big picture in terms of minimal/maximal number of points inside rectangular regions?

Cormode, Garofalakis & Shekelyan (alphabetically ordered)
Data-Independent Space Partitionings for Summaries
ACM PODS (2021) [conference, project, bibtex, pdf]

alpha-binnings: Given a big rectangle whose pieces we can copy, how many pieces do we need to copy to approximate any smaller rectangle with copied pieces and how many of those copied pieces could overlap with each other?

Query Processing

Solon P. Pissis, Michael Shekelyan, Chang Liu, Grigorios Loukides
Frequency-Constrained Substring Complexity.
SPIRE (2023) [conference, link]

Given a set of texts and the set of all substrings of some reference text, how can one in linear-time (ignoring logarithmic terms) compute a count matrix where each entry counts how many distinct substrings have a certain length and appear in a certain number of texts?

Shekelyan, Dignoes & Gamper
Sparse prefix sums: Constant-time range sum queries over sparse multidimensional data cubes
INFORMATION SYSTEMS (2019) [journal, slides, bibtex, link]

Given a count matrix with only a few non-zero entries, what is a reasonably sized data structure that enables us to sum any sub-matrix in a few microseconds?

Shekelyan, Josse & Schubert
ParetoPrep: Efficient Lower Bounds for Path Skylines and Fast Path Computation
SSTD (2015) [conference, link, project, bibtex, pdf]

Given a network with multi-cost edges and two end nodes, how can one find the shortest/cheapest paths for all trade-offs between cost dimensions?

Shekelyan, Josse & Schubert
Linear path skylines in multicriteria networks
IEEE ICDE (2015) [conference, link, project, bibtex, pdf]

Given a network with multi-cost edges, how can one find the shortest/cheapest paths between two end nodes for all linear combinations of cost dimensions?

Shekelyan, Josse & Schubert
Linear path skylines in bicriteria networks
DASFAA (2014) [conference, link, project, bibtex, pdf]

Given a network with edges with multi-cost edges, how can one find the shortest/cheapest paths between two end nodes for all linear combinations of two cost dimensions?

Information Retrieval

Kriegel, Petri, Schubert, Shekelyan, Stockerl
Efficient similarity search on 3D bounding box annotations
SPIE (2012)

Given a database of CT scans of tissue anomalies, how can one find for a new CT scan of another tissue anomaly similar anomalies in the database?