Institute Chair Professor, Computer Science and Engineering, IIT Bombay.
Modern AI for Age-old problems of Database Systems
Modern deep learning methods are pushing the frontiers of many challenging problems in database systems. We will discuss state-of-the-art machine learning models that are providing record breaking accuracies on age-old tasks such as entity resolution, missing value imputation and natural language querying. We are also witnessing brand new capabilities that were not possible a few years back. We can perform entity resolution across heterogeneous, multilingual datasets via actively learned nearest neighbor indices, thereby eliminating the need for hand-designing blocking predicates. On multi-dimensional analytical datasets, we can now obtain joint distributions over thousands of interacting time series. Advances in pre-trained language models have significantly increased the capability of handling natural variations in parsing text input to SQL. In this talk we will go over the latest ML research that is enabling these capabilities, and present directions for future research.
Sunita Sarawagi researches in the fields of databases and machine learning. She is institute chair professor at IIT Bombay. She got her PhD in databases from the University of California at Berkeley and a bachelors degree from IIT Kharagpur. She has also worked at Google Research (2014-2016), CMU (2004), and IBM Almaden Research Center (1996-1999). She was awarded the Infosys Prize in 2019 for Engineering and Computer Science, and the distinguished Alumnus award from IIT Kharagpur. She has several publications including best paper awards at ACM SIGMOD, VLDB, ICDM, NIPS, and ICML conferences. She has served on the board of directors of the ACM SIGKDD and VLDB foundation. She was program chair for the ACM SIGKDD 2008 conference, research track co-chair for the VLDB 2011 conference and has served as program committee member for SIGMOD, VLDB, SIGKDD, ICDE, and ICML conferences, and on the editorial boards of the ACM TODS and ACM TKDD journals.
Tsinghua University, Beijing, China.
openGauss: An Autonomous Database System
In this talk, I will present how to build an autonomous database system. I discuss how to integrate effective learning-based models into database systems to build learned optimizers (including learned query rewrite, learned cost/cardinality estimation, learned join order selection and physical operator selection) and learned database advisors (including self-monitoring, self-diagnosis, self-configuration, and self- optimization). I also propose an effective validation model to validate the effectiveness of learned models. I discuss effective training data management and model management platforms to easily deploy learned models. Finally I will introduce our autonomous database system openGauss.
Guoliang Li is a full professor and the deputy head of Department of Computer Science, Tsinghua University, Beijing, China. His research interests include large-scale data integration and cleaning, human-in-the-loop data management, machine learning for database, and database for machine learning. He is a general co-chair of SIGMOD 2021, demo co-chair of VLDB 2021, industry co-chair of ICDE 2022, and PC co-chair of DASFAA 2019. He is also an associate editor of VLDB journal and IEEE TKDE. He is a steering committee member of IEEE TCDE and DASFAA. He received best paper awards (candidates) of VLDB 2020, ICDE 2018, KDD 2018, CIKM 2017 and DASFAA 2014. He received Early Research Contribution Award of VLDB and Early Career Award of IEEE TCDE.
Director of Center for Artificial Intelligence and Big Data (CARIDA) and Database Exploration Laboratory (DBXLAB),
University of Texas at Arlington
Fairness in Database Querying
We are being constantly judged by automated decision systems that have been criticized for being sometimes discriminatory and unfair. In this talk, we focus on fairness issues that arise when users perform ad-hoc exploration of databases using commonly available querying mechanisms such as selection/range queries, ranking queries, top-k queries, etc. For example, a user may use such queries to retrieve suitable employment opportunities in a jobs database, dating partners in a matching website, or apartments to rent in a real estate database. We will discuss how such querying mechanisms can give sometimes give results that are discriminatory, and discuss approaches to detect, mitigate and prevent such scenarios from occurring. Our work represents some of the initial steps towards the broader goal of integrating fairness conditions into database query processing and data management.
Dr. Das is the Associate Dean for Research, College of Engineering, a Distinguished University Chair Professor of Computer Science and Engineering, Director of the Center for Artificial Intelligence and Big Data (CARIDA), and Director of the Database Exploration Laboratory (DBXLAB) at UT-Arlington. Prior to joining UTA in 2004, he has held positions at Microsoft Research, Compaq Corporation and the University of Memphis. He graduated with a B.Tech in computer science from IIT Kanpur, India in 1983, and with a Ph.D in computer science from the University of Wisconsin, Madison in 1990. He is a Fellow of the IEEE and a member of the ACM.
Dr. Das has published over 200 papers, many of which have appeared in premier data mining, database and algorithms conferences and journals. His work has received several awards, including the Communications of the ACM Research Highlights in 2021, ACM SIGMOD Research Highlights in 2019, IEEE ICDE 10-Year Influential Paper Award in 2012, ACM SIGKDD Doctoral Dissertation Award (honorable mention) in 2014 for his former student, and numerous other awards. He has presented keynotes and invited lectures, tutorials and courses at various universities, research labs, and conferences. He has been on the Editorial Board of the journals ACM Transactions on Database Systems and IEEE Transactions on Knowledge and Data Engineering. He has served in the organization roles of several major conferences, including as General Chair of ACM SIGMOD/PODS 2018.
Inria and Institut Polytechnique de Paris.
Teasing journalistic findings out of heterogeneous sources: a data/AI journey
Freedom of the press is under thread worldwide, and the quality of information that people have access to is dangerously degraded, under the joint threat of non-democratic governments and fake information propagation. The press as an industry needs powerful data management tools to help them interpret the complex reality surrounding us.
Since 2018, I have been cooperating with journalists from Le Monde, France's leading newspaper, in devising tools for analyzing large and heterogeneuos data sources that they are interested in. This research has been embodied in ConnectionLens, a graph ETL tool capable of ingesting heterogeneous data sources into a graph, enriched (with the help of ML methods) with entities extracted from data of any type. On such integrated graphs, we devised novel algorithms for keyword search, and combine them in more recent research with structured querying. The talk describes the architecture and main algorithmic challenges in building and exploiting ConnectionLens graphs, illustrated in particular on an application where we study conflicts of interest in the biomedical domain. This is joint work with A. Anadiotis, O. Balalau, H. Galhardas and many others. ConnectionLens Web site (papers+code): https://team.inria.fr/cedar/connectionlens/
This research has been funded by Agence Nationale de la Recherche AI Chair SourcesSay (https://sourcessay.inria.fr)
Ioana Manolescu is a senior researcher at Inria Saclay and a part-time professor at Ecole Polytechnique, France. She is the lead of the CEDAR INRIA team focusing on rich data analytics at cloud scale. She is also the scientific director of LabIA, a program ran by the French government whereas AI problems raised by branches of the local and national French public administration are tackled by French research teams. She is a member of the PVLDB Endowment Board of Trustees, and has been Associate Editor for PVLDB, president of the ACM SIGMOD PhD Award Committee, chair of the IEEE ICDE conference, and a program chair of EDBT, SSDBM, ICWE among others. She has co-authored more than 150 articles in international journals and conferences and co-authored books on "Web Data Management" and on "Cloud-based RDF Data Management". Her main research interests algebraic and storage optimizations for semistructured data, in particular Semantic Web graphs, novel data models and languages for complex data management, data models and algorithms for fact-checking and data journalism, a topic where she is collaborating with journalists from Le Monde. She is also a recipient of the ANR AI Chair titled "SourcesSay: Intelligent Analysis and Interconnexion of Heterogeneous Data in Digital Arenas" (2020-2024).
Senior Vice President,
Data and In-Memory Technologies,
Oracle Database In-Memory: The Enterprise, at Warp Speed !
In-memory computing is more than simply about speed. It enables a fundamental transformation in business processes. Just as air travel enabled more than just the ability to travel faster: It enabled a completely new global economy, reshaped politics, and transformed society. Oracle's Database In-Memory feature similarly enables not just faster analytics, but a fundamental rethinking and drastic simplification of the traditional analytic platform. Combined with Oracle's many converged database capabilities that bring together many data models and many workloads, and with Oracle's Autonomous Database platform that makes self-driving machine-learning powered databases a reality, Oracle Database In-Memory allows for the development of a new category of enterprise architectures, with significant reduction in cost and complexity, while providing unmatched performance for both transactional and analytic workloads.
Tirthankar Lahiri is Senior Vice President of the Data and In-Memory Technologies area within Oracle Database. This includes the Oracle Database Engine (Transactions, Data formats, Indexes, Advanced Compression, Database In-Memory, the Database Filesystem, etc.), the Oracle TimesTen In-Memory Database, and Oracle NoSQLDB. Tirthankar has 26 years of experience in the database industry and has worked on a number of areas such as Performance, Scalability, Manageability and In-Memory architectures. He has 45 issued and several pending patents and a number of academic publications. He has a B.Tech in Computer Science from the Indian Institute of Technology (Kharagpur) and an MS in Electrical Engineering from Stanford University. He was in the PhD program at Stanford and his research included NUMA Operating Systems (the Hive project) and Semistrucured Data (the Ozone project) before his PhD was superceded by his industrial career.