![]() ![]() 6 data profiling tools-open source and commercialĭata profiling, a tedious and labor intensive activity, can be automated with tools, to make huge data projects more feasible. Extremely important for data fields used for outbound communications (emails, phone numbers, addresses). Pattern and frequency distributions-checks if data fields are formatted correctly, for example if emails are in a valid format. This helps BI tools perform inner or outer joins correctly. Also, helps identify orphan keys, which are problematic for ETL and future analysis.Ĭardinality-checks relationships like one-to-one, one-to-many, many-to-many, between related data sets. Key integrity-ensures keys are always present in the data, using zero/blank/null analysis. Enables setting column widths just wide enough for the data, to improve performance. Minimum / maximum / average string length-helps select appropriate data types and sizes in target database.Helps ETL architects setup appropriate default values. Percent of zero / blank / null values-identifies missing or unknown data.Distinct count and percent-identifies natural keys, distinct values in each column that can help process inserts and updates. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |