We are happy to contribute to open source data and software.

Data

  • GitTables - a corpus of 1.7 million relational tables extracted from CSV files from GitHub. The table columns have been automatically annotated with types from Schema.org.
  • MeasEval - manually annotated dataset for entity and semantic relation extraction focused on finding counts and measurements, attributes of these quantities, and additional contextual information. This was produced for SemEval-2021 Task 8, which we organized.

Software

  • mlinspect - analyze and inspect python machine learning pipelines to check for common issues.
  • Torch-RGCN - a pytorch implementation of relational graph convolutional networks.
  • BLP - a model for performing inductive link prediction and entity classification for knowledge graphs where entites have textual descriptions.
  • BioBLP - a modular framework for learning on multimodal biomedical knowledge graphs
  • conversationkg - package for turning email lists into knowledge graphs.