Publications
2024
-
Magneto: Combining Small and Large Language Models for Schema Matching
Yurong Liu, Eduardo Pena, Aécio Santos, Eden Wu, Juliana Freire
ArXiv: 2412.08194
-
Enhancing Biomedical Schema Matching with LLM-based Training Data Generation
Yurong Liu, Aécio Santos, Eduardo H. M. Pena, Roque Lopez, Eden Wu, Juliana Freire
TRL '24: NeurIPS 2024 Third Table Representation Learning Workshop
-
Sampling Methods for Inner Product Sketching
Majid Daliri, Juliana Freire, Christopher Musco, Aécio Santos, Haoxiang Zhang
VLDB '24: Proceedings of the VLDB Endowment, Vol. 17, No. 9.
-
Efficiently Estimating Mutual Information Between Attributes Across Tables
Aécio Santos, Flip Korn, Juliana Freire
ICDE'24: 2024 IEEE 40th International Conference on Data Engineering (ICDE)
-
Simple Analysis of Priority Sampling
Majid Daliri, Juliana Freire, Christopher Musco, Aécio Santos, Haoxiang Zhang
SOSA '24: 2024 Symposium on Simplicity in Algorithms
2023
-
AlphaD3M: An AutoML Library for Multiple ML Tasks
Roque Lopez, Raoni Lourenco, Remi Rampin, Sonia Castelo, Aécio Santos, Jorge Ono, Claudio Silva, Juliana Freire
AutoML '23: AutoML Conference 2023 (ABCD Track)
-
Weighted Minwise Hashing Beats Linear Sketching for Inner Product Estimation
Aline Bessa, Majid Daliri, Juliana Freire, Cameron Musco, Christopher Musco, Aécio Santos, Haoxiang Zhang
PODS '23: Proceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
-
Using Pipeline Performance Prediction to Accelerate AutoML Systems
Haoxiang Zhang, Roque López, Aécio Santos, Jorge Piazentin Ono, Aline Bessa, and Juliana Freire
DEEM '23: Proceedings of the Seventh Workshop on Data Management for End-to-End Machine Learning
🏆 Received the workshop's Best Paper Award.
2022
-
NYUCIN at the NTCIR-16 Dataset Search 2 Task
Levy Silva, Luciano Barbosa, Sonia Castelo, Haoxiang Zhang, Aécio Santos, Juliana Freire
NTCIR 16: Proceedings of the 16th NTCIR Conference on Evaluation of Information Access Technologies
🏆 Our solution for the 2022 NTCIR Data Search 2 competition ranked 1st place among all submitted runs. -
A Sketch-based Index for Correlated Dataset Search
Aécio Santos, Aline Bessa, Christopher Musco, Juliana Freire
ICDE'22: Proceedings of the 2022 IEEE 38th International Conference on Data Engineering (ICDE)
2021
-
DSDD: Domain-Specific Dataset Discovery on the Web
Haoxiang Zhang, Aécio Santos, Juliana Freire
CIKM'21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
-
Auctus: A Dataset Search Engine for Data Discovery and Augmentation
Sonia Castelo, Rémi Rampin, Aécio Santos, Aline Bessa, Fernando Chirigati, Juliana Freire
PVLDB'21: Proceedings of the VLDB Endowment: Volume 14, Issue 12
Demonstration at the VLDB 2021 conference. -
Correlation Sketches for Approximate Join-Correlation Queries
Aécio Santos, Aline Bessa, Fernando Chirigati, Christopher Musco, Juliana Freire
SIGMOD'21: Proceedings of the 2021 International Conference on Management of Data
-
An Ecosystem of Tools for Modeling Political Violence
Aline Bessa, Sonia Castelo, Rémi Rampin, Aécio Santos, Mike Shoemate, Vito D'Orazio, Juliana Freire
SIGMOD'21: Proceedings of the 2021 International Conference on Management of Data
2020
-
Towards Evaluating Exploratory Model Building Process with AutoML Systems
Sunsoo (Ray) Hong, Sonia Castelo, Vito D’Orazio, Christopher Benthune, AécioSantos, Scott Langevin, David Jonker, Enrico Bertini, and Juliana Freire
ArXiv: 2009.00449
2019
-
Visus: An Interactive System for Automatic Machine Learning Model Building and Curation
Aécio Santos, Sonia Castelo, Cristian Felix, Jorge H. P. Ono, Bowen Yu, Ray Hong, Cláudio Silva, Enrico Bertini, and Juliana Freire
HILDA'19: Proceedings of the Workshop on Human-In-the-Loop Data Analytics
Presented at HILDA 2019 (co-located with SIGMOD'2019), in Amsterdam. -
A Topic-Agnostic Approach for Identifying Fake News Pages
Sonia Castelo, Thais Almeida, Anas Elghafari, Aécio Santos, Kien Pham, Eduardo Nakamura, and Juliana Freire
WWW'19 Companion: The 2019 Web Conference Companion
Presented at MisinfoWorkshop2019 (co-located with WWW'2019). -
Bootstrapping Domain-Specific Content Discovery on the Web
Kien Pham, Aécio Santos, Juliana Freire
WWW'19: Proceedings of The Web Conference 2019
2018
-
Learning to Discover Domain-Specific Web Content
Kien Pham, Aécio Santos, Juliana Freire
WSDM'18: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining
16% acceptance rate.
2016
-
Understanding Website Behavior based on User Agent
Kien Pham, Aécio Santos, Juliana Freire
SIGIR'16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval
-
A First Study on Temporal Dynamics of Topics on the Web
Aécio Santos, Bruno Pasini, Juliana Freire
TempWeb'16: Proceedings of the 25th International Conference Companion on World Wide Web
Presented at 8th Temporal Web Analytics Workshop (co-located with WWW'2016). -
Interactive Exploration for Domain Discovery on the Web
Yamuna Krishnamurthy, Kien Pham, Aécio Santos, and Juliana Freire
IDEA'16: KDD 2016 Workshop on Interactive Data Exploration and Analytics
2015
-
A Genetic Programming Framework to Schedule Webpage Updates
Aécio S. R. Santos, Cristiano R. Carvalho, Jussara M. Almeida, Edleno S. Moura, Altigran S. Silva, Nivio Ziviani
IRJ: Information Retrieval: Volume 18 Issue 1
2013
-
Learning to Schedule Webpage Updates Using Genetic Programming
Aécio S. Santos, Nivio Ziviani, Jussara Almeida, Cristiano R. Carvalho, Edleno Silva Moura, Altigran Soares Silva
SPIRE 2013: Proceedings of the 20th International Symposium on String Processing and Information Retrieval
Unpublished Work
2021
-
Sublogarithmic Algorithms for Planar Point Location
Aécio Santos
Technical report