Based on a Fireside Chat webcast on migration theory, practice and technology featuring Nick Cavalancia, Techvangelism, and Chris Clark, CEO Trusted Data Solutions.
If you need to migrate your email archive from on-premises to the cloud, wouldn’t it be great to have an EASY button to push. Presto! Archive migration is done.
Unfortunately for IT professionals, there is no EASY button. As the saying goes, it’s complicated.
That was a key takeaway from a recent Fireside Chat webcast featuring Nick Cavalancia of Techvangelism, and Chris Clark, CEO of Trusted Data Solutions. They have both done email archive migrations and based on their experience it is never easy. Clark was emphatic that if a consultant tells you it is going to be easy, you better get that in writing.
Diverse and sometimes archaic or corrupted data formats are things that make archive migration complicated. But the good news is that while it is never going to be easy, careful preparation and planning can help you achieve a Zen-like balance between effortlessness and disastrous.
“One would think you get a migration tool, you point it at the source data then you point it at the destination environment and press start,” explained Nick Cavalancia of Techvangelism. “In concept, it seems like that’s all it really should be. But there’s so much more that needs to be addressed before you select a tool.”
The webcast focused on what IT departments need to consider when they are first tasked with doing an email archive migration.
10 Questions to map out your migration
Cavalancia presented 10 questions designed to help IT professionals understand the scope of the project they are facing. He recommends getting the answers first before starting a search for a migration tool or consultant.
- How do users currently access archived data? Is it already at a portal? Are there multiple different platforms? Is it in a public folder? How they are accessing it today matters because when you migrate, you want to assure that the data is available during the migration as much as possible.
- What administrative access do you currently have to the archived data? Can you get access to the raw data itself? Is it on a platform where users can do e-discovery? What options do you have to export the data?
- How long will it take to move the data out? This is another big factor. Just trying to figure out how long it will take to extract the data, let alone migrate it. Everyone thinks about the migration, not everyone thinks about the extraction portion before the migration itself. How long does it take to get it out of where it is presently?
- Will all the same data be necessary? Just because you used to keep it archived, let’s say five years ago, that doesn’t mean you need it today. So, one of the factors that is going to come into play is how to whittle down and get what you truly need for the business, once you have the new archive set up.
- What features and functionality do you have with your current archive? Just so you can figure out when you’re going to the new one, do you have an equivalent for instance? Do you have some sort of alternate functionality that you can utilize as necessary?
- What do you need with your archive? The reason for asking this is to stimulate thinking ahead about what platform you are going to with the migration. For example, it might be Office 365 because you already have your email there. But is that where you need to be going? Are you planning on using Microsoft’s old mentality of just stick it in the mailbox and let it extend out as an older dataset inside that, or do you want to utilize it as a separate archive? Think about e-discovery, and whether you need to have legal holds or not. All of those things come into play.
- How long do you need to keep data? It comes down to an issue of cost. The cost to maintain the data cost to manage the data, the cost to get rid of the data.
- How should you remove the old archive data? This is a lot harder than you think. You may be thinking that it’s simply about the age of the data, it’s not. Because it’s not data that you’re archiving, it’s valuable information that you’re archiving. This becomes a question of what is in the archive. Then you determine from there what you should be removing and what you shouldn’t. It’s a very detailed answer over just going, ah, anything over seven years. That’s not the answer, necessarily.
- How much does your data archive currently cost? Think about cost. How much does it cost now? How much is it going to cost on the new platform? You suddenly see the cost go up to keep it somewhere new. You might have to figure out, well, maybe we should look at what data we should be removing.
- Finally, what tools are you going to use to move the archive? Will it facilitate all of the things covered in the questions above? Will it facilitate all the functionality? Will it help you eliminate the data you don’t need to retain? Is it going to make sure you move every bit of the data, and maintain data fidelity?
In reviewing these questions, Chris Clark, CEO of Trusted Data, added that he often sees IT departments focusing on the destination, for example a cloud archive, before they begin to consider how they are going to get there from the typical on-premises archive. Figuring out how to migrate often comes up late in the planning when it would be better to start thinking about it early.
“The journey matters and you’re going to find out a lot of things in your migration, probably that you didn’t know because invariably, the first question we ask is why are you moving?” Clark said. Why are you moving your archive and many times either there are performance issues with the current archive, known issues with the current archive, many generations of sources have gone into the archive. It’s good to have situational awareness of what you have upfront because that will dictate many, many decisions and a lot of the process going forward. It’s not a trivial event, in moving your data.”
Taking an empirical view of your data
In the Fireside Chat, Clark made a case for empirically looking at the data in the existing archive before planning how to migrate it. In 95 percent of cases, he said validation queries prove that the client’s initial estimate of how much data is in their archive proved to be much lower than what was there.
“If there’s a difference between what you think you have and what you have when you do some analysis, that starts to flag the why issues,” he said. “Why is that the case? You start to pull on a string that can lead to some very troubling revelations. What you think you have in terms of data is not empirically accurate.”
Clark offered the example of an enterprise client that estimated the current archive had 150 million to 200 million messages. When Trusted Data did its empirical analysis it found twice that number.
“That’s a non-trivial deviation,” he said. “What happened was there were several issues that had occurred in the indexing of the archive over time that took a fault and compounded it. So that was a real eye-opener to the client. It wasn’t necessarily a technology flaw or bug. They had upgraded many times. They had plugged more sources to it over time.
There were ongoing mistakes that were compounded. These are the types of unintended consequences.”
If the organization wants to migrate to a new archive because data retrieval from the old archive is too slow, the reason for the poor performance may influence planning for the new archive. For example, if the data has ballooned over the lifetime of the archive – in one instance an organization decided to archive all voice mail – it may be holding more data than it was ever architected to handle.
Cavalancia said part of the problem comes from thinking about migration tactically. How do we move the on-prem email archive to Office 365? But what is missing is strategic planning starting with the kinds of data the organization needs rather than just throwing everything into the archive as if it all had the same value.
“Somebody comes to me with a 10 to 50 terabyte migration, we don’t think about it that way,” Clark replied. “That may be 15 billion emails. So effectively we’re migrating 15 billion things.”
Think before you migrate
Those 15 billion things often include emails from several generations of systems and are in proprietary formats.
“It’s not like it’s open,” Clark explains.
“It’s not like copying data. It’s reading the way it sits on the original archive, somehow unpacking that, turning it into something neutral that can be put into a format that’s fit for purpose to wherever the target archive is. The target archive has a data format it wants to see for each of the messages. So there’s a data format conversion process. It’s not just migrating. Migrating is just moving it from point A to point B. If it were that easy we wouldn’t be having this discussion.”
When it comes to converting emails to the data format the target archive requires, in his experience about 80 percent will go through fine. But the other 20 percent may be problematic. Also, some of it may be corrupted and may need to be stored outside the new archive where IT may only try to recover it if there is a pressing business or legal need.
“We have some customers that migrate that data to a pool where if they later need it they can go in and remediate it,” Clark said.
Cavalancia said that organizations need to make decisions on what kinds of data they will need to move to the new archive. Do you need emails in archaic proprietary formats that go back 30 years? One caution is that part of the migration planning is not an IT issue. IT needs to have the business side make those kinds of decisions, which may be based on legal or regulatory issues balanced against the cost of migrating and storing billions and billions of emails.
In planning for the new archive, Clark suggested making things easier for the next migration.
“The best advice I heard, is when you go into an archive, you need to know how you’re going to get out,” he said. “That will help with the next generation of migration.”
Cavalancia provided a checklist for planning a migration:
- Start with the business requirements > HR, legal, compliance needs
- Consider accessibility to the data during and after the migration
- Ensure that the process is legally sound (a process that proves data wasn’t altered)
- Chose a tool/partner that addresses more than just the move
Clark stressed the need for documenting the process. “One of the most valuable components is the documentation of the process and the results. X number of messages were migrated. Y number of emails were problematic and are in separate storage. That’s the most important thing for the business. You have to be able to prove that you did due diligence. Get the business side involved in the diligence. Let the business decide what to do with corrupted data.”
About trusted data solutions
For more than two decades, Trusted Data Solutions has led the market in legacy data management. As foremost experts in backup tape restoration, email migration, and voice retrieval Trusted Data continues to be the preferred choice for eDiscovery companies, corporations, government agencies and legal firms that require a trusted partner for their data transformation initiatives.
Source: Trusted Data Solutions