Posts

Showing posts from September, 2020

Google Cloud

Scopes - Global(Network), Region(static external IP), Zone(disks, vms )  Project ( id , name, number )  -  Any GC resource must belong to a project.  A id is unique & can never be reused. can be seen as workspace. click here to see  Comparison of services among various cloud providers   Cloud SQL is similar to AWS RDS.  Cloud Storage is similar to S3. it has multiple tiers and support Object Lifecycle Management to transition from one storage class to other based on certain criteria.  Workflow is similar to AWS Glue or Azure Data Factory.  BigQuery compares  to  Redshift  in AWS Pub/Sub + Dataflow compares to   Kinesis in AWS and Azure Event Hub in Azure.

Enterprise Integration Patterns

Enterprise Integration Patterns

Kerberos

Image
Kerberos is an authentication protocol for trusted hosts on untrusted networks.  The authentication among various parties happen as shown in the diagram. The diagram is based on the video presentation at following link: Kerberos Authentication Explained | A deep dive

S3 Transfer Accelerator

 S3 transfer accelerator   takes advantage of Amazon CloudFront’s globally distributed edge locations.  This is used for transferring fast , easy , secure transfer of files over long distances between client and S3 bucket. 

AWS DirectConnect

 AWS DirectConnect is a service to connect your on-premise systems with AWS without going thru internet. You may need this specifically if you need high speed and/or low latency.

AWS GuardDuty

 AWS GuardDuty is a thread detection service by continuously analyzing event log data. It can monitor VPC Flow logs, CloudTrail event logs, DNS logs and integrate with AWS CloudWatch events. It generates alerts.

AWS DataSync

AWS DataSync is an online data transfer service that simplifies, automates, and accelerates copying large amounts of data to and from AWS storage services over the internet or  AWS Direct Connect . DataSync can copy data between Network File System (NFS), Server Message Block (SMB) file servers, self-managed object storage, or  AWS Snowcone , and Amazon Simple Storage Service ( Amazon S3 ) buckets, Amazon Elastic File System ( Amazon EFS ) file systems, and  Amazon FSx for Windows File Server  file systems. D ataSync includes encryption and integrity validation to help make sure your data arrives securely, intact.  DataSync does both full initial copies, and incremental transfers of changing data.  

Elasticsearch

What is Elasticsearch ELK stack - Elasticsearch , Logstash, Kibana and Beats 

AWS DynamoDB

Global & Local Secondary Indexes : when data access patterns can not be accommodated using primary keys.

AWS Big Data Specialty

Ingestion Tools Delivery: Guranteed oredering  delivery by all AWS services except Firehose & SQS(Standard) Exactly Once  delivery by only DynamoDB Streams and Amazon SQL(FIFO)  and all others are at least once.  AWS Lambda : Limited capabilities of buffering  AWS EMR : Single Availability Zone AWS Redshift does not support resource based policies.   DynamoDB provides fine-grained access to your tables and data

PowerBI Gateway

References: https://blog.pragmaticworks.com/power-bi-and-data-security-on-premises-data-gateway

AWS GLUE

AWS Glue -  batch jobs, ETL, minimum 5 min intervals, no support for NoSQL stores, not suitable for heterogeneous processing use AWS Data Pipeline. Configurable DPUs (Data Processing Units) fully managed ( serverless) Scale out Apache Spark environment, pay-as-you-go, ETL service, discovers and profiles data via Glue Data Catalog, generates ETL code to transform data into target schema, can run the  job to load data into destination, allows you to configure, orchestrate and monitor complex data flows. The AWS Glue Data Catalog is Apache Hive Metastore compatible and is a drop-in replacement for the Apache Hive Metastore for Big Data applications running on Amazon EMR. I n the context of updating the metadata,  whatever you can do with a Hive DB, you can also do with AWS Glue Data Catalog.    AWS Glue =  Data Catalog + Flexible Scheduler For supported data sources see  AWS Glue FAQ AWS Glue can also be used for complex ETL of  streaming data.  If focus is on delivery of streaming data u

AWS DMS

 AWS DMS :  For One time migration and ongoing replication or  change data capture. no impact on source database. for CDC uses native database APIs to read change logs  from source db and replay it in target store. uses EC2 as replication instance. instance can be scaled up/down