Publications

2024

  • Magneto: Combining Small and Large Language Models for Schema Matching
    Yurong Liu, Eduardo Pena, Aécio Santos, Eden Wu, Juliana Freire
    ArXiv: 2412.08194
  • Enhancing Biomedical Schema Matching with LLM-based Training Data Generation
    Yurong Liu, Aécio Santos, Eduardo H. M. Pena, Roque Lopez, Eden Wu, Juliana Freire
    TRL '24: NeurIPS 2024 Third Table Representation Learning Workshop
  • Sampling Methods for Inner Product Sketching
    Majid Daliri, Juliana Freire, Christopher Musco, Aécio Santos, Haoxiang Zhang
    VLDB '24: Proceedings of the VLDB Endowment, Vol. 17, No. 9.
  • Efficiently Estimating Mutual Information Between Attributes Across Tables
    Aécio Santos, Flip Korn, Juliana Freire
    ICDE'24: 2024 IEEE 40th International Conference on Data Engineering (ICDE)
  • Simple Analysis of Priority Sampling
    Majid Daliri, Juliana Freire, Christopher Musco, Aécio Santos, Haoxiang Zhang
    SOSA '24: 2024 Symposium on Simplicity in Algorithms

2023

  • AlphaD3M: An AutoML Library for Multiple ML Tasks
    Roque Lopez, Raoni Lourenco, Remi Rampin, Sonia Castelo, Aécio Santos, Jorge Ono, Claudio Silva, Juliana Freire
    AutoML '23: AutoML Conference 2023 (ABCD Track)
  • Weighted Minwise Hashing Beats Linear Sketching for Inner Product Estimation
    Aline Bessa, Majid Daliri, Juliana Freire, Cameron Musco, Christopher Musco, Aécio Santos, Haoxiang Zhang
    PODS '23: Proceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
  • Using Pipeline Performance Prediction to Accelerate AutoML Systems
    Haoxiang Zhang, Roque López, Aécio Santos, Jorge Piazentin Ono, Aline Bessa, and Juliana Freire
    DEEM '23: Proceedings of the Seventh Workshop on Data Management for End-to-End Machine Learning
    🏆 Received the workshop's Best Paper Award.

2022

  • NYUCIN at the NTCIR-16 Dataset Search 2 Task
    Levy Silva, Luciano Barbosa, Sonia Castelo, Haoxiang Zhang, Aécio Santos, Juliana Freire
    NTCIR 16: Proceedings of the 16th NTCIR Conference on Evaluation of Information Access Technologies
    🏆 Our solution for the 2022 NTCIR Data Search 2 competition ranked 1st place among all submitted runs.
  • A Sketch-based Index for Correlated Dataset Search
    Aécio Santos, Aline Bessa, Christopher Musco, Juliana Freire
    ICDE'22: Proceedings of the 2022 IEEE 38th International Conference on Data Engineering (ICDE)

2021

  • DSDD: Domain-Specific Dataset Discovery on the Web
    Haoxiang Zhang, Aécio Santos, Juliana Freire
    CIKM'21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
  • Auctus: A Dataset Search Engine for Data Discovery and Augmentation
    Sonia Castelo, Rémi Rampin, Aécio Santos, Aline Bessa, Fernando Chirigati, Juliana Freire
    PVLDB'21: Proceedings of the VLDB Endowment: Volume 14, Issue 12
    Demonstration at the VLDB 2021 conference.
  • Correlation Sketches for Approximate Join-Correlation Queries
    Aécio Santos, Aline Bessa, Fernando Chirigati, Christopher Musco, Juliana Freire
    SIGMOD'21: Proceedings of the 2021 International Conference on Management of Data
  • An Ecosystem of Tools for Modeling Political Violence
    Aline Bessa, Sonia Castelo, Rémi Rampin, Aécio Santos, Mike Shoemate, Vito D'Orazio, Juliana Freire
    SIGMOD'21: Proceedings of the 2021 International Conference on Management of Data

2020

  • Towards Evaluating Exploratory Model Building Process with AutoML Systems
    Sunsoo (Ray) Hong, Sonia Castelo, Vito D’Orazio, Christopher Benthune, AécioSantos, Scott Langevin, David Jonker, Enrico Bertini, and Juliana Freire
    ArXiv: 2009.00449

2019

  • Visus: An Interactive System for Automatic Machine Learning Model Building and Curation
    Aécio Santos, Sonia Castelo, Cristian Felix, Jorge H. P. Ono, Bowen Yu, Ray Hong, Cláudio Silva, Enrico Bertini, and Juliana Freire
    HILDA'19: Proceedings of the Workshop on Human-In-the-Loop Data Analytics
    Presented at HILDA 2019 (co-located with SIGMOD'2019), in Amsterdam.
  • A Topic-Agnostic Approach for Identifying Fake News Pages
    Sonia Castelo, Thais Almeida, Anas Elghafari, Aécio Santos, Kien Pham, Eduardo Nakamura, and Juliana Freire
    WWW'19 Companion: The 2019 Web Conference Companion
    Presented at MisinfoWorkshop2019 (co-located with WWW'2019).
  • Bootstrapping Domain-Specific Content Discovery on the Web
    Kien Pham, Aécio Santos, Juliana Freire
    WWW'19: Proceedings of The Web Conference 2019

2018

  • Learning to Discover Domain-Specific Web Content
    Kien Pham, Aécio Santos, Juliana Freire
    WSDM'18: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining
    16% acceptance rate.

2016

  • Understanding Website Behavior based on User Agent
    Kien Pham, Aécio Santos, Juliana Freire
    SIGIR'16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval
  • A First Study on Temporal Dynamics of Topics on the Web
    Aécio Santos, Bruno Pasini, Juliana Freire
    TempWeb'16: Proceedings of the 25th International Conference Companion on World Wide Web
    Presented at 8th Temporal Web Analytics Workshop (co-located with WWW'2016).
  • Interactive Exploration for Domain Discovery on the Web
    Yamuna Krishnamurthy, Kien Pham, Aécio Santos, and Juliana Freire
    IDEA'16: KDD 2016 Workshop on Interactive Data Exploration and Analytics

2015

  • A Genetic Programming Framework to Schedule Webpage Updates
    Aécio S. R. Santos, Cristiano R. Carvalho, Jussara M. Almeida, Edleno S. Moura, Altigran S. Silva, Nivio Ziviani
    IRJ: Information Retrieval: Volume 18 Issue 1

2013

  • Learning to Schedule Webpage Updates Using Genetic Programming
    Aécio S. Santos, Nivio Ziviani, Jussara Almeida, Cristiano R. Carvalho, Edleno Silva Moura, Altigran Soares Silva
    SPIRE 2013: Proceedings of the 20th International Symposium on String Processing and Information Retrieval

Unpublished Work

2021

  • Sublogarithmic Algorithms for Planar Point Location
    Aécio Santos
    Technical report