posted 4 years ago in CUBRID Apps&Tools category by Esen Sagynov
Have you ever been in a situation when you had to analyze your users' data to check if there was any
naughty user who had entered some trash text instead of their real first and last names? Or have you been in a situation when you needed an open source tool which would allow you to split your users' full name into first and last names, then automatically insert them into appropriate columns in your database table? Then meet DataCleaner!
DataCleaner is an open source data quality analyzer which can work with CUBRID, Oracle, MS SQL Server, MySQL, PostgreSQL, CSV, and other datasources to help you quickly analyze, clean and profile you data. Its core consists of a strong data profiling engine, which is also extensible, and thereby adds data cleansing, transformations, enrichment, duplicate finding, matching and merging functionalities.
It is very exciting that DataCleaner now provides support for CUBRID RDBMS in their latest 2.5.1 release. Kasper Sørensen, the founder and developer of DataCleaner, says he is "very pleased to see DataCleaner finally supporting the CUBRID database." He added: "With it's focus on web applications I see a tremendous potential for the CUBRID database in conjunction with DataCleaner. And so far the CUBRID community has been a joy serving."
Here is how Kasper explains why DataCleaner can be a useful tool for Web developers:
Every self-respecting web application takes user input in one form or another. But as it turns out, we as webapp developers do not always foresee every possible type of user input. So over time data quality issues will occur! Typically what we see happening in web applications is in the form of missing values, fake/spam/made-up values, wrongly formatted database statements leading to values being placed in the wrong fields, duplicated records and so on. All typically because web applications are rapidly being developed and maintained in a quite agile way.
Here is what you can do with DataCleaner:
- Profile and analyze your database within minutes!
- Access almost any datastore - Oracle, MySQL, PostgreSQL, MS SQL Server, MongoDB, CUBRID, CSV files, Excel spreadsheets, DBase and more.
- Discover patterns in your textual data with the Pattern Finder.
- Find out which values occur the most with the Value Distribution profile.
- Cleanse your contact details with name and address validations.
- Detect duplicates using fuzzy logic and configurable weights and thresholds.
- Merge your duplicates and create a single version of the truth.
- Write data back to relational databases, CSV files, Excel spreadsheets or MongoDB databases.
Frankly speaking, DataCleaner is so sophisticated that when I ran DataCleaner for the first time, I was stunned by the number of features it provided. For you not to feel lost in this ocean of data analyzing possibilities, I recommend you first to watch some of the DataCleaner screencasts available on their project website. Then, based on the video instructions and your needs, you can launch DataCleaner and get started with data cleansing and analysis. We have also created a short tutorial which shows you how to quickly connect DataCleaner to CUBRID Database.
CUBRID community members will certainly benefit much from the whole set of powerful features that DataCleaner provides. On behalf of our community I would like to thank Kasper for his valuable contribution.
Information for our readers!
If you develop an open source application and would like to become a CUBRID Partner by supporting CUBRID Database in your project, contact us by email firstname.lastname@example.org. In your letter please provide an overview of your software, project links, and your statement on behalf of your project. We will be very glad to have you onboard!