Open Source Data Catalog Solutions
Amundsen
Amundsen is a popular open-source data discovery and metadata platform created by Lyft. It helps organizations to find, understand, and trust their data.
Key Features
- Data discovery through search functionality
- Table and column-level metadata
- Data lineage visualization
- User-friendly UI with table popularity metrics
- Integration with various data sources (Hive, Presto, Redshift, Snowflake, etc.)
- Support for custom metadata
Architecture
Amundsen consists of three microservices:
- Metadata Service - Neo4j database for storing metadata
- Search Service - Elasticsearch for search functionality
- Frontend Service - React application for user interface
Getting Started
DataHub
DataHub by LinkedIn is a modern data catalog built to enable end-to-end data discovery, data observability, and data governance.
Apache Atlas
Apache Atlas provides open metadata management and governance capabilities for organizations to build a catalog of their data assets.
Metacat
Metacat by Netflix is a unified metadata exploration API service that provides metadata for various data sources.
Marquez
Marquez is an open source metadata service for the collection, aggregation, and visualization of a data ecosystem’s metadata.