Data Collection and Loading
“Computers are cheap and powerful, but clean data is scarce and expensive.”
– Bob Cringley
There’s a three letter acronym for this whole process – ETL (Extraction, Transformation and Loading). It’s a big phrase for a simple concept: before your database is of any use to you at all, it has to have data in it. This means you have to get the data, collate and format it to meet your specific needs, and then load it into the database..
A quick search on “ETL”, “tools” and “telecom” will give you a couple of million hits. That’s great if you are looking for ETL tools, but less helpful if you just want to get your database loaded. Each phase of this process presents difficulties that must be overcome if your database is to be complete and correct.
Extraction – the gathering of data. This can include physical audits, equipment configuration dumps, spreadsheets, legacy systems, in-house databases, hard copies of records, sticky notes…literally almost anything. Gathering data is easy; gathering the right data isn’t. Which equipment dumps are needed? How accurate is the legacy data: good enough to use or not? How do we extract the data? And what does it all mean? Remember that data is only the means to achieve the end result of information, which is data in context. If the data extraction team doesn’t understand the context, you may get data, but you won’t get the right information from it.
Transformation – Once you have the data, how do you prepare it to be loaded into the database? This is the phase where the data must be organized according to your specifications, including naming conventions, equipment and network models, bandwidths, and the business rules that define your company. And it all has to be loaded in the right order, according to the internal rules of the Granite® platform. Which takes us to…
Loading – when you load the data into the Granite® platform, there are various paths you can take. Some use the middle tier functionality of the Granite® platform, which ensures that your business logic is enforced, but is slow for large data loads. Other paths use direct loading into the database tables. It’s very fast, but carries all the risks associated with direct loads. At Sincera, we can load your data the way you want it loaded. And if you'd rather do it yourself with your own resources, we can provide the tools and training to allow you to do it safely and quickly.
When you load data, anything that fails to load is termed “fallout”. There are many companies in the professional services sector that tout their ability to deal with fallout. Typically this involves taking the data that didn't load and doing one of two things:
- Keep changing the data until it loads. This means getting more information from your team, more time preprocessing the data, and keep reloading until most of it loads in some fashion.
- Opening the parameters in the system until the validation allows the data to load, whether it meets your business needs or not.
Sincera also has a business process for fallout: we avoid having fallout in the first place. Whenever you have high percentages of fallout (>5-10%), that’s a sign that the data collection, extraction or transformation wasn’t done with the proper strategy and attention to detail. If you'd rather work with your system and your network than deal with endless rounds of fallout management, call Sincera.