Database Migration to Cloud: SQL, NoSQL, and Managed DB Services
Database migration to the cloud encompasses the process of moving relational, non-relational, and hybrid data stores from on-premises infrastructure—or from one cloud environment to another—to cloud-hosted or fully managed database services. This page covers the structural mechanics of SQL and NoSQL migration paths, the managed service landscape across major cloud providers, and the classification boundaries that determine which approach applies to a given workload. Understanding these distinctions is essential for teams managing compliance obligations under frameworks such as HIPAA, FedRAMP, and PCI DSS, where data residency and access control requirements directly constrain migration architecture choices.
- Definition and scope
- Core mechanics or structure
- Causal relationships or drivers
- Classification boundaries
- Tradeoffs and tensions
- Common misconceptions
- Checklist or steps (non-advisory)
- Reference table or matrix
- References
Definition and scope
Database migration to the cloud refers to the structured transfer of database schemas, data objects, stored procedures, triggers, indexes, and associated application dependencies from a source environment to a target cloud platform. The scope extends beyond simple data copy operations to include schema conversion, character encoding normalization, data type reconciliation, connection string reconfiguration, and validation of query output parity between source and target systems.
The National Institute of Standards and Technology (NIST SP 800-145) defines cloud computing as a model enabling on-demand network access to a shared pool of configurable computing resources, a definition that directly frames the target state of database migration: shifting from operator-managed server instances to provider-managed or self-managed cloud database services. The practical scope of a database migration project typically spans three layers—infrastructure (compute and storage), platform (database engine and version), and data (schemas, records, and access policies).
Data migration to cloud is a broader category of which database migration is a specialized subset, distinguished by the structured schema requirements and transactional consistency guarantees that relational and non-relational engines enforce.
Core mechanics or structure
Database migration executes through four discrete phases regardless of the engine type involved.
Schema extraction and conversion. The source schema—tables, views, foreign key relationships, stored procedures, and user-defined types—is exported and analyzed for compatibility with the target engine. AWS Schema Conversion Tool (AWS SCT) and the open-source pgloader utility are documented tools for this phase. Schema conversion complexity is highest when migrating across engine families, such as Oracle to PostgreSQL, because procedural language constructs (PL/SQL vs. PL/pgSQL) differ substantially.
Bulk data load. The initial data load moves existing records from source to target using export formats such as CSV, Parquet, or native binary dumps. For large datasets, this phase typically uses parallel load workers. AWS Database Migration Service (DMS) documentation specifies that the full-load phase can run concurrently with ongoing source transactions when change data capture (CDC) is enabled.
Change data capture (CDC). CDC tracks row-level inserts, updates, and deletes on the source database after the bulk load completes. The log-based CDC method reads transaction logs (e.g., Oracle redo logs, MySQL binlog, SQL Server transaction log) to replicate changes to the target continuously, minimizing the cutover window. This is the primary mechanism for achieving near-zero downtime migration, documented in detail in cloud migration downtime minimization strategies.
Cutover and validation. The final phase halts writes to the source, applies residual CDC changes to the target, and redirects application connections. Validation compares row counts, checksum aggregates, and sample query outputs between source and target to confirm data integrity.
Causal relationships or drivers
Three primary drivers push organizations toward cloud database migration.
Infrastructure lifecycle pressure. On-premises database servers typically carry 3-to-5-year hardware refresh cycles. End-of-support events—such as Microsoft SQL Server 2012 reaching end of extended support in July 2022 (Microsoft Lifecycle Policy)—create security exposure that accelerates migration timelines, because unpatched engines become non-compliant under frameworks such as PCI DSS 4.0 (PCI Security Standards Council).
Managed service economics. Managed database services eliminate DBA overhead for patching, backup management, high availability configuration, and storage provisioning. The shift converts capital expenditure on database server hardware to operational expenditure on consumption-based pricing. Cloud migration cost estimation analysis must account for license mobility implications, particularly for Oracle Database and Microsoft SQL Server, where bring-your-own-license (BYOL) rules differ by cloud provider and by deployment model (dedicated host vs. shared tenancy).
Scalability requirements. Relational databases on fixed hardware face vertical scaling ceilings. NoSQL managed services such as Amazon DynamoDB, Google Cloud Firestore, and Azure Cosmos DB offer horizontal partitioning (sharding) by design, enabling throughput scaling without schema redesign. This is a structural driver for migrating high-volume event, session, and telemetry workloads away from relational engines.
Classification boundaries
Database migration scenarios fall into four distinct categories based on source engine type and target service model.
Homogeneous migration occurs when source and target use the same database engine and version (e.g., MySQL 8.0 on-premises to Amazon RDS for MySQL 8.0). Schema compatibility is high, and migration tools require minimal conversion logic. This is the lowest-risk migration class.
Heterogeneous migration involves engine or version changes (e.g., Oracle Database 19c to Amazon Aurora PostgreSQL). Schema conversion is required, and procedural code must be manually rewritten or converted using tools like AWS SCT. Heterogeneous migrations carry significantly higher risk of application breakage due to SQL dialect differences.
Lift-and-shift database migration moves the database engine to an Infrastructure-as-a-Service (IaaS) virtual machine without changing the engine version or management model. This approach, detailed in lift-and-shift migration explained, preserves compatibility but forfeits managed service benefits such as automated failover and read replica provisioning.
Replatforming to managed services replaces self-managed engine instances with Platform-as-a-Service (PaaS) offerings—Amazon RDS, Azure SQL Managed Instance, Google Cloud SQL, or equivalent NoSQL managed services. This maps to the "replatform" tier in the AWS 7 Rs framework (AWS Migration Documentation), requiring application connection string changes and often minor schema adjustments to leverage managed service constraints (e.g., no direct OS access, restricted superuser privileges). The distinction between replatforming and refactoring is covered in replatforming vs. refactoring cloud.
Tradeoffs and tensions
Managed service constraints vs. operational control. Amazon RDS, Azure Database for PostgreSQL, and Google Cloud SQL impose restrictions that do not exist on self-managed instances—no OS-level access, restricted DBA superuser capabilities, limited extension support, and enforced maintenance windows. Organizations with workloads dependent on custom storage engines, non-standard PostgreSQL extensions, or complex linked server configurations encounter friction when migrating to PaaS.
NoSQL flexibility vs. consistency guarantees. NoSQL engines optimized for high write throughput (DynamoDB, Cassandra, MongoDB Atlas) typically offer eventual consistency by default rather than ACID transaction guarantees across distributed partitions. Workloads migrated from relational engines that assumed strong consistency may produce incorrect results unless application logic is rewritten to handle eventual consistency semantics. The CAP theorem, formalized by Eric Brewer in 2000, frames this tension: distributed data stores cannot simultaneously guarantee consistency, availability, and partition tolerance.
Downtime tolerance vs. migration complexity. Near-zero downtime migrations using CDC require source databases to have transaction log access enabled, sufficient log retention configured, and sufficient network bandwidth between source and target. Organizations with strict maintenance window policies (e.g., financial trading systems) may face technical debt in enabling binlog or redo log access that was previously disabled for performance reasons.
Compliance constraints on managed services. HIPAA-covered entities and FedRAMP-authorized workloads must confirm that the managed database service holds the appropriate authorization. As of the FedRAMP Marketplace, not all managed database service tiers carry FedRAMP High authorization. Choosing a managed service tier without the correct authorization creates compliance gaps even if the cloud provider's infrastructure is authorized. HIPAA compliant cloud migration and FedRAMP cloud migration pages document authorization scope boundaries.
Common misconceptions
Misconception: A database backup restore is equivalent to a migration. A backup restore copies data but does not validate application query compatibility, resolve schema differences, migrate user accounts and permissions, or update connection configurations. Restore-only approaches routinely leave orphaned users, broken stored procedures, and misconfigured character sets.
Misconception: Managed database services are always cheaper than self-managed. Managed services charge premium rates for convenience features. Amazon RDS Multi-AZ instances carry approximately 2x the cost of equivalent single-AZ deployments. For predictable, steady-state workloads with experienced DBAs on staff, self-managed IaaS deployments can produce lower total cost of ownership over a 3-year period.
Misconception: NoSQL migration eliminates schema management. NoSQL databases are schema-on-read rather than schema-on-write, but this does not eliminate schema governance. Without enforced schema validation, document databases accumulate inconsistent field naming, mixed data types for the same logical field, and missing required attributes—problems that are harder to detect and remediate than relational schema violations.
Misconception: CDC replication guarantees zero data loss. CDC captures changes from transaction logs, but log retention policies on the source can allow gaps if the CDC consumer falls behind the log rotation window. AWS DMS documentation explicitly notes that if the CDC lag exceeds the source database's log retention period, the replication task must restart from a new full load.
Checklist or steps (non-advisory)
The following steps represent the standard sequence documented in AWS, Google Cloud, and Azure migration guides for database migration projects.
- Inventory source databases — catalog engine type, version, size (GB), schema object counts, active connection counts, and backup schedules.
- Identify dependencies — map all applications, ETL pipelines, and reporting tools that reference the source database connection string.
- Select target service model — choose between IaaS (self-managed on VM), PaaS (managed service), or DBaaS (serverless options such as Aurora Serverless or Firestore).
- Run schema compatibility assessment — execute AWS SCT, ora2pg, or equivalent tool to generate an assessment report with conversion complexity scores.
- Configure source for CDC — enable binary logging (MySQL), supplemental logging (Oracle), or CDC capture (SQL Server) on the source instance.
- Provision target environment — create target database instance, configure parameter groups, security groups, subnet groups, and encryption at rest.
- Execute full load — run the bulk data transfer from source to target; record start and end timestamps and row counts per table.
- Validate full load — compare row counts, primary key ranges, and checksum aggregates between source and target.
- Enable CDC replication — start the ongoing replication task and monitor lag metrics continuously.
- Test application connectivity — point a non-production application instance at the target database; execute functional and performance test suites. Reference cloud migration testing strategies for structured test coverage approaches.
- Execute cutover — stop writes to source, apply final CDC batch, update production connection strings, and monitor error rates for 15–30 minutes post-cutover.
- Decommission source — retain source database in read-only or snapshot state for a documented rollback window before final decommission.
Reference table or matrix
| Migration Type | Source → Target Example | Schema Conversion Required | Downtime Model | Compliance Considerations |
|---|---|---|---|---|
| Homogeneous (SQL) | MySQL 8.0 → RDS MySQL 8.0 | Minimal | Near-zero (CDC) | Standard IAM + encryption at rest |
| Heterogeneous (SQL) | Oracle 19c → Aurora PostgreSQL | High (PL/SQL → PL/pgSQL) | Planned window or CDC | Verify engine-level FedRAMP/HIPAA scope |
| Lift-and-shift (SQL) | SQL Server → SQL Server on EC2 | None | Backup/restore window | License compliance (BYOL rules apply) |
| Replatform (SQL) | SQL Server → Azure SQL MI | Low–Medium | CDC or backup | PCI DSS SAQ D scope review required |
| Document NoSQL | MongoDB → Atlas or Firestore | Schema-on-read normalization | Online sync tools | Encryption in transit + field-level encryption |
| Key-value NoSQL | Redis on-prem → ElastiCache | Key namespace mapping | Replication + cutover | Data residency region selection |
| Wide-column NoSQL | Cassandra → Azure Cosmos DB API | CQL compatibility check | Dual-write migration | Eventual consistency tolerance required |
| Data warehouse | Teradata → BigQuery or Redshift | SQL dialect conversion | Bulk export/import | Column-level access control mapping |
Service model selection guidance is also covered in the cloud migration strategy frameworks reference documentation, and cloud migration tools comparison provides engine-specific tool coverage.
References
- NIST SP 800-145: The NIST Definition of Cloud Computing
- AWS Migration Whitepaper — The 7 Rs Migration Strategies
- AWS Database Migration Service Documentation
- Microsoft Lifecycle Policy — SQL Server 2012
- PCI Security Standards Council — PCI DSS Document Library
- FedRAMP Marketplace — Authorized Cloud Services
- Google Cloud Database Migration Service Documentation
- Azure Database Migration Service Documentation