Data profiling is the process of examining the data available from an existing information source (e.g. a database or a file) and collecting statistics or informative summaries about that data.The purpose of these statistics may be to:
- Find out whether existing data can be easily used for other purposes
- Improve the ability to search data by tagging it with keywords, descriptions, or assigning it to a category
- Assess data quality, including whether the data conforms to particular standards or patterns
- Assess the risk involved in integration data in new applications, including the challenges of joins.
- Assess whether known metadata accurately describes the actual values in the source database
- Understanding data challenges early in any data intensive project, so that late project surprises are avoided. Finding data problems late in the project can lead to delays and cost overruns.