HomeBig DataWhat to contemplate when migrating information warehouse to Amazon Redshift

What to contemplate when migrating information warehouse to Amazon Redshift


Prospects are migrating information warehouses to Amazon Redshift as a result of it’s quick, scalable, and cost-effective. Nonetheless, information warehouse migration tasks may be advanced and difficult. On this submit, I make it easier to perceive the widespread drivers of information warehouse migration, migration methods, and what instruments and companies can be found to help along with your migration undertaking.

Let’s first talk about the large information panorama, the which means of a contemporary information structure, and what it’s good to take into account on your information warehouse migration undertaking when constructing a contemporary information structure.

Enterprise alternatives

Knowledge is altering the best way we work, stay, and play. All of this habits change and the motion to the cloud has resulted in an information explosion over the previous 20 years. The proliferation of Web of Issues and good telephones have accelerated the quantity of the info that’s generated each day. Enterprise fashions have shifted, and so have the wants of the folks operating these companies. We now have moved from speaking about terabytes of information only a few years in the past to now petabytes and exabytes of information. By placing information to work effectively and constructing deep enterprise insights from the info collected, companies in numerous industries and of varied sizes can obtain a variety of enterprise outcomes. These may be broadly categorized into the next core enterprise outcomes:

  • Enhancing operational effectivity – By making sense of the info collected from varied operational processes, companies can enhance buyer expertise, enhance manufacturing effectivity, and enhance gross sales and advertising and marketing agility
  • Making extra knowledgeable selections – By means of creating extra significant insights by bringing collectively full image of information throughout a corporation, companies could make extra knowledgeable selections
  • Accelerating innovation – Combining inner and exterior information sources allow quite a lot of AI and machine studying (ML) use instances that assist companies automate processes and unlock enterprise alternatives that had been both unattainable to do or too tough to do earlier than

Enterprise challenges

Exponential information development has additionally offered enterprise challenges.

To begin with, companies have to entry all information throughout the group, and information could also be distributed in silos. It comes from quite a lot of sources, in a variety of information varieties and in massive quantity and velocity. Some information could also be saved as structured information in relational databases. Different information could also be saved as semi-structured information in object shops, akin to media information and the clickstream information that’s continuously streaming from cell units.

Secondly, to construct insights from information, companies have to dive deep into the info by conducting analytics. These analytics actions usually contain dozens and a whole lot of information analysts who have to entry the system concurrently. Having a performant system that’s scalable to fulfill the question demand is commonly a problem. It will get extra advanced when companies have to share the analyzed information with their clients.

Final however not least, companies want a cheap resolution to handle information silos, efficiency, scalability, safety, and compliance challenges. Having the ability to visualize and predict value is critical for a enterprise to measure the cost-effectiveness of its resolution.

To unravel these challenges, companies want a future proof trendy information structure and a sturdy, environment friendly analytics system.

Trendy information structure

A contemporary information structure permits organizations to retailer any quantity of information in open codecs, break down disconnected information silos, empower customers to run analytics or ML utilizing their most popular software or method, and handle who has entry to particular items of information with the right safety and information governance controls.

The AWS information lake structure is a contemporary information structure that allows you to retailer information in an information lake and use a hoop of purpose-built information companies across the lake, as proven within the following determine. This lets you make selections with velocity and agility, at scale, and cost-effectively. For extra particulars, confer with Trendy Knowledge Structure on AWS.

Trendy information warehouse

Amazon Redshift is a totally managed, scalable, trendy information warehouse that accelerates time to insights with quick, straightforward, and safe analytics at scale. With Amazon Redshift, you may analyze all of your information and get efficiency at any scale with low and predictable prices.

Amazon Redshift presents the next advantages:

  • Analyze all of your information – With Amazon Redshift, you may simply analyze all of your information throughout your information warehouse and information lake with constant safety and governance insurance policies. We name this the fashionable information structure. With Amazon Redshift Spectrum, you may question information in your information lake without having for loading or different information preparation. And with information lake export, it can save you the outcomes of an Amazon Redshift question again into the lake. This implies you may benefit from real-time analytics and ML/AI use instances with out re-architecture, as a result of Amazon Redshift is totally built-in along with your information lake. With new capabilities like information sharing, you may simply share information throughout Amazon Redshift clusters each internally and externally, so everybody has a stay and constant view of the info. Amazon Redshift ML makes it straightforward to do extra along with your information—you may create, prepare, and deploy ML fashions utilizing acquainted SQL instructions straight in Amazon Redshift information warehouses.
  • Quick efficiency at any scale – Amazon Redshift is a self-tuning and self-learning system that lets you get the perfect efficiency on your workloads with out the undifferentiated heavy lifting of tuning your information warehouse with duties akin to defining kind keys and distribution keys, and new capabilities like materialized views, auto-refresh, and auto-query rewrite. Amazon Redshift scales to ship constantly quick outcomes from gigabytes to petabytes of information, and from a couple of customers to 1000’s. As your person base scales to 1000’s of concurrent customers, the concurrency scaling functionality routinely deploys the mandatory compute sources to handle the extra load. Amazon Redshift RA3 situations with managed storage separate compute and storage, so you may scale every independently and solely pay for the storage you want. AQUA (Superior Question Accelerator) for Amazon Redshift is a brand new distributed and hardware-accelerated cache that routinely boosts sure sorts of queries.
  • Straightforward analytics for everybody – Amazon Redshift is a totally managed information warehouse that abstracts away the burden of detailed infrastructure administration or efficiency optimization. You possibly can deal with attending to insights, reasonably than performing upkeep duties like provisioning infrastructure, creating backups, establishing the structure of information, and different duties. You possibly can function information in open codecs, use acquainted SQL instructions, and benefit from question visualizations accessible by the brand new Question Editor v2. You may also entry information from any software by a safe information API with out configuring software program drivers, managing database connections. Amazon Redshift is appropriate with enterprise intelligence (BI) instruments, opening up the ability and integration of Amazon Redshift to enterprise customers who function from throughout the BI software.

A contemporary information structure with an information lake structure and trendy information warehouse with Amazon Redshift helps companies in all totally different sizes tackle massive information challenges, make sense of a considerable amount of information, and drive enterprise outcomes. You can begin the journey of constructing a contemporary information structure by migrating your information warehouse to Amazon Redshift.

Migration concerns

Knowledge warehouse migration presents a problem when it comes to undertaking complexity and poses a threat when it comes to sources, time, and value. To scale back the complexity of information warehouse migration, it’s important to decide on a proper migration technique primarily based in your current information warehouse panorama and the quantity of transformation required emigrate to Amazon Redshift. The next are the important thing elements that may affect your migration technique determination:

  • Measurement – The overall measurement of the supply information warehouse to be migrated is set by the objects, tables, and databases which can be included within the migration. A superb understanding of the info sources and information domains required for transferring to Amazon Redshift results in an optimum sizing of the migration undertaking.
  • Knowledge switch – Knowledge warehouse migration entails information switch between the supply information warehouse servers and AWS. You possibly can both switch information over a community interconnection between the supply location and AWS akin to AWS Direct Join or switch information offline by way of the instruments or companies such because the AWS Snow Household.
  • Knowledge change price – How typically do information updates or modifications happen in your information warehouse? Your current information warehouse information change price determines the replace intervals required to maintain the supply information warehouse and the goal Amazon Redshift in sync. A supply information warehouse with a excessive information change price requires the service switching from the supply to Amazon Redshift to finish inside an replace interval, which results in a shorter migration cutover window.
  • Knowledge transformation – Transferring your current information warehouse to Amazon Redshift is a heterogenous migration involving information transformation akin to information mapping and schema change. The complexity of information transformation determines the processing time required for an iteration of migration.
  • Migration and ETL instruments – The choice of migration and extract, remodel, and cargo (ETL) instruments can impression the migration undertaking. For instance, the efforts required for deployment and setup of those instruments can differ. We glance nearer at AWS instruments and companies shortly.

After you’ve gotten factored in all these concerns, you may decide a migration technique possibility on your Amazon Redshift migration undertaking.

Migration methods

You possibly can select from three migration methods: one-step migration, two-step migration, or wave-based migration.

One-step migration is an effective possibility for databases that don’t require steady operation akin to steady replication to maintain ongoing information modifications in sync between the supply and vacation spot. You possibly can extract current databases as comma separated worth (CSV) information, or columnar format like Parquet, then use AWS Snow Household companies akin to AWS Snowball to ship datasets to Amazon Easy Storage Service (Amazon S3) for loading into Amazon Redshift. You then take a look at the vacation spot Amazon Redshift database for information consistency with the supply. In spite of everything validations have handed, the database is converted to AWS.

Two-step migration is often used for databases of any measurement that require steady operation, akin to the continual replication. Through the migration, the supply databases have ongoing information modifications, and steady replication retains information modifications in sync between the supply and Amazon Redshift. The breakdown of the two-step migration technique is as follows:

  • Preliminary information migration – The information is extracted from the supply database, ideally throughout non-peak utilization to attenuate the impression. The information is then migrated to Amazon Redshift by following the one-step migration strategy described beforehand.
  • Modified information migration – Knowledge that modified within the supply database after the preliminary information migration is propagated to the vacation spot earlier than switchover. This step synchronizes the supply and vacation spot databases. After all of the modified information is migrated, you may validate the info within the vacation spot database and carry out vital assessments. If all assessments are handed, you then change over to the Amazon Redshift information warehouse.

Wave-based migration is appropriate for large-scale information warehouse migration tasks. The precept of wave-based migration is taking precautions to divide a fancy migration undertaking into a number of logical and systematic waves. This technique can considerably scale back the complexity and threat. You begin from a workload that covers an excellent variety of information sources and topic areas with medium complexity, then add extra information sources and topic areas in every subsequent wave. With this technique, you run each the supply information warehouse and Amazon Redshift manufacturing environments in parallel for a sure period of time earlier than you may totally retire the supply information warehouse. See Develop an software migration methodology to modernize your information warehouse with Amazon Redshift for particulars on easy methods to establish and group information sources and analytics functions emigrate from the supply information warehouse to Amazon Redshift utilizing the wave-based migration strategy.

To information your migration technique determination, confer with the next desk to map the consideration elements with a most popular migration technique.

. One-Step Migration Two-Step Migration Wave-Primarily based Migration
The variety of topic areas in migration scope Small Medium to Massive Medium to Massive
Knowledge switch quantity Small to Massive Small to Massive Small to Massive
Knowledge change price throughout migration None Minimal to Frequent Minimal to Frequent
Knowledge transformation complexity Any Any Any
Migration change window for switching from supply to focus on Hours Seconds Seconds
Migration undertaking period Weeks Weeks to Months Months

Migration course of

On this part, we overview the three high-level steps of the migration course of. The 2-step migration technique and wave-based migration technique contain all three migration steps. Nonetheless, the wave-based migration technique contains a lot of iterations. As a result of solely databases that don’t require steady operations are good suits for one-step migration, solely Steps 1 and a pair of within the migration course of are required.

Step 1: Convert schema and topic space

On this step, you make the supply information warehouse schema appropriate with the Amazon Redshift schema by changing the supply information warehouse schema utilizing schema conversion instruments akin to AWS Schema Conversion Device (AWS SCT) and the opposite instruments from AWS companions. In some conditions, you may additionally be required to make use of customized code to conduct advanced schema conversions. We dive deeper into AWS SCT and migration greatest practices in a later part.

Step 2: Preliminary information extraction and cargo

On this step, you full the preliminary information extraction and cargo the supply information into Amazon Redshift for the primary time. You should use AWS SCT information extractors to extract information from the supply information warehouse and cargo information to Amazon S3 in case your information measurement and information switch necessities help you switch information over the interconnected community. Alternatively, if there are limitations akin to community capability restrict, you may load information to Snowball and from there information will get loaded to Amazon S3. When the info within the supply information warehouse is obtainable on Amazon S3, it’s loaded to Amazon Redshift. In conditions when the supply information warehouse native instruments do a greater information unload and cargo job than AWS SCT information extractors, chances are you’ll select to make use of the native instruments to finish this step.

Step 3: Delta and incremental load

On this step, you employ AWS SCT and typically supply information warehouse native instruments to seize and cargo delta or incremental modifications from sources to Amazon Redshift. That is typically referred to vary information seize (CDC). CDC is a course of that captures modifications made in a database, and ensures that these modifications are replicated to a vacation spot akin to an information warehouse.

You must now have sufficient data to begin creating a migration plan on your information warehouse. Within the following part, I dive deeper into the AWS companies that may make it easier to migrate your information warehouse to Amazon Redshift, and the perfect practices of utilizing these companies to speed up a profitable supply of your information warehouse migration undertaking.

Knowledge warehouse migration companies

Knowledge warehouse migration entails a set of companies and instruments to assist the migration course of. You start with making a database migration evaluation report after which changing the supply information schema to be appropriate with Amazon Redshift by utilizing AWS SCT. To maneuver information, you should use the AWS SCT information extraction software, which has integration with AWS Knowledge Migration Service (AWS DMS) to create and handle AWS DMS duties and orchestrate information migration.

To switch supply information over the interconnected community between the supply and AWS, you should use AWS Storage Gateway, Amazon Kinesis Knowledge Firehose, Direct Join, AWS Switch Household companies, Amazon S3 Switch Acceleration, and AWS DataSync. For information warehouse migration involving a big quantity of information, or if there are constraints with the interconnected community capability, you may switch information utilizing the AWS Snow Household of companies. With this strategy, you may copy the info to the gadget, ship it again to AWS, and have the info copied to Amazon Redshift by way of Amazon S3.

AWS SCT is a vital service to speed up your information warehouse migration to Amazon Redshift. Let’s dive deeper into it.

Migrating utilizing AWS SCT

AWS SCT automates a lot of the method of changing your information warehouse schema to an Amazon Redshift database schema. As a result of the supply and goal database engines can have many various options and capabilities, AWS SCT makes an attempt to create an equal schema in your goal database wherever potential. If no direct conversion is feasible, AWS SCT creates a database migration evaluation report that will help you convert your schema. The database migration evaluation report gives essential details about the conversion of the schema out of your supply database to your goal database. The report summarizes all of the schema conversion duties and particulars the motion objects for schema objects that may’t be transformed to the DB engine of your goal database. The report additionally contains estimates of the quantity of effort that it’s going to take to jot down the equal code in your goal database that may’t be transformed routinely.

Storage optimization is the center of an information warehouse conversion. When utilizing your Amazon Redshift database as a supply and a take a look at Amazon Redshift database because the goal, AWS SCT recommends kind keys and distribution keys to optimize your database.

With AWS SCT, you may convert the next information warehouse schemas to Amazon Redshift:

  • Amazon Redshift
  • Azure Synapse Analytics (model 10)
  • Greenplum Database (model 4.3 and later)
  • Microsoft SQL Server (model 2008 and later)
  • Netezza (model 7.0.3 and later)
  • Oracle (model 10.2 and later)
  • Snowflake (model 3)
  • Teradata (model 13 and later)
  • Vertica (model 7.2 and later)

At AWS, we proceed to launch new options and enhancements to enhance our product. For the most recent supported conversions, go to the AWS SCT Consumer Information.

Migrating information utilizing AWS SCT information extraction software

You should use an AWS SCT information extraction software to extract information out of your on-premises information warehouse and migrate it to Amazon Redshift. The agent extracts your information and uploads the info to both Amazon S3 or, for large-scale migrations, an AWS Snowball Household service. You possibly can then use AWS SCT to repeat the info to Amazon Redshift. Amazon S3 is a storage and retrieval service. To retailer an object in Amazon S3, you add the file you wish to retailer to an S3 bucket. Whenever you add a file, you may set permissions on the article and in addition on any metadata.

In large-scale migrations involving information add to a AWS Snowball Household service, you should use wizard-based workflows in AWS SCT to automate the method through which the info extraction software orchestrates AWS DMS to carry out the precise migration.

Issues for Amazon Redshift migration instruments

To enhance and speed up information warehouse migration to Amazon Redshift, take into account the next suggestions and greatest practices. Tthis checklist just isn’t exhaustive. Be sure to have an excellent understanding of your information warehouse profile and decide which greatest practices you should use on your migration undertaking.

  • Use AWS SCT to create a migration evaluation report and scope migration effort.
  • Automate migration with AWS SCT the place potential. The expertise from our clients reveals that AWS SCT can routinely create the vast majority of DDL and SQL scripts.
  • When automated schema conversion just isn’t potential, use customized scripting for the code conversion.
  • Set up AWS SCT information extractor brokers as shut as potential to the info supply to enhance information migration efficiency and reliability.
  • To enhance information migration efficiency, correctly measurement your Amazon Elastic Compute Cloud (Amazon EC2) occasion and its equal digital machines that the info extractor brokers are put in on.
  • Configure a number of information extractor brokers to run a number of duties in parallel to enhance information migration efficiency by maximizing the utilization of the allotted community bandwidth.
  • Regulate AWS SCT reminiscence configuration to enhance schema conversion efficiency.
  • Use Amazon S3 to retailer the massive objects akin to photos, PDFs, and different binary information out of your current information warehouse.
  • Emigrate massive tables, use digital partitioning and create sub-tasks to enhance information migration efficiency.
  • Perceive the use instances of AWS companies akin to Direct Join, the AWS Switch Household, and the AWS Snow Household. Choose the precise service or software to fulfill your information migration necessities.
  • Perceive AWS service quotas and make knowledgeable migration design selections.

Abstract

Knowledge is rising in quantity and complexity sooner than ever. Nonetheless, solely a fraction of this invaluable asset is obtainable for evaluation. Conventional on-premises information warehouses have inflexible architectures that don’t scale for contemporary massive information analytics use instances. These conventional information warehouses are costly to arrange and function, and require massive upfront investments in each software program and {hardware}.

On this submit, we mentioned Amazon Redshift as a totally managed, scalable, trendy information warehouse that may make it easier to analyze all of your information, and obtain efficiency at any scale with low and predictable value. Emigrate your information warehouse to Amazon Redshift, it’s good to take into account a spread of things, akin to the overall measurement of the info warehouse, information change price, and information transformation complexity, earlier than choosing an acceptable migration technique and course of to scale back the complexity and value of your information warehouse migration undertaking. With AWS companies such AWS SCT and AWS DMS, and by adopting the information and the perfect practices of those companies, you may automate migration duties, scale migration, speed up the supply of your information warehouse migration undertaking, and delight your clients.


In regards to the Creator

Lewis Tang is a Senior Options Architect at Amazon Internet Companies primarily based in Sydney, Australia. Lewis gives companions steerage to a broad vary of AWS companies and assist companions to speed up AWS follow development.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments