AWS, BIG DATA, JAVA

gRPC - Remote Procedure Call framework that is used for high performance communication between services. This is alternative to REST esp. for micro-services communication. gRPC uses HTTP/2 protocol. It uses protocol buffers for encoding data which is more lighter/efficient than JSON/XML in terms of bandwidth consumption during data transfer. Also HTTP/2 allows multiplexing requests so multiple requests/responses can be served at the same time rather than sequentially. gRPC is built to overcome the limitations of REST in microservice communication. Understanding gRPC

TCP vs UDP

TCP - Transmission Control Protocol => connection based, flow-control , error checking , in order delivery, guaranteed delivery, relatively slower ( FTP, HTTP/HTTPS, SSH, POP/IMAP, SMTP, DNS ) UDP - User Datagram Protocol - connection less, no flow control , no error checking , packet loss possible and they can be out of order, faster ( VPN tunneling , video streaming, online games, live broadcasting, DNS, VOIP, TFTP )

SSL vs. TLS

SSL and TLS both the cryptographic protocols but SSL is older version and has been replaced by TLS. These protocols allows authenticating the server and encrypt the traffic between server and client. Nice article on how SSL cryptography works: SSL Cryptography In short, server sends the public key to browser/client. Browser generates symmetric session key and encrypts it using server's public key. Server decrypts and retrieves the symmetric session key. Now browser and server both communicate by encrypting/decrypting data using symmetric session key which is used for that session only. in short, SSL ends up using asymmetric and symmetric encryption. Asymmetric or public key encryption algorithms are : RSA ( public key is factor of two large primes and private keys is those two large prime numbers) , ECC ( Elliptic Curve Cryptography - relies on the fact that it is impractical to find discrete algo for random elliptical curve element in rel...

OSI

The purpose of OSI ( Open Systems Interconnection) Model to provide a set of design standards for equipment manufacturers so they could communicate with each other. Nice explanation of various layers: OSI & TCP/IP Models More details on OSI layers and how different protocols and devices fit into these layers: OSI Layers, Protocols, Devices . Routers work at network layer(3) , switches works at Data Link Layer (2) and hubs and cables work at Physical layer(1). Network Devices : Routers, Switches, Hubs One key difference between switch and hub is that switch switches the data frames intelligently to the port which is connected to the destination device, while hub sends it to all the ports. Hub tends to cause traffic congestion in the network and data frames end on devices which are not the intended destination for it and those devices have to process it unnecessarily only to figure that it is not intended for them. Router is need...

Blue Green Deployment

Deployment strategy for minimum or no downtime. This strategy becomes more feasible and relevant in Cloud environment as Infrastructure provisioning becomes automated. Let us call existing prod env. Blue. 1. Create a clone of that, call it Green. 2. Switch the prod traffic to Green 3. Update blue with changes and test. 4. when everything look ok , switch the traffic back to Blue. 5. Terminate Green

Utilities/Tools

November 19, 2020

Below is list of tools/sites which can be handy at times: mailinator.com awwapp.com mockable.io

RTO & RPO

November 18, 2020

RTO - Recovery Time Objective RPO - Recovery Point Objective

Industry Regulations

November 16, 2020

There are numerous regulations/guidelines across industries. Listing majority of those here as an executive summary. Finance SOX : The Sarbanes-Oxley Act of 2002 came in response to financial scandals in the early 2000s involving publicly traded companies such as Enron Corporation, Tyco International plc, and WorldCom. GLBA : The GLBA was an attempt to update and modernize the financial industry. This act was passed in 1999 under Clinton administration and it allowed commercial banks to provide financial services like investments, insurance etc. It was also known as repeal of Glass-Steagall Act of 1933. PCI DSS : Security guidelines dealing with payment card industry. You can find the latest version of PCI DSS at PCI Document Library GDPR : GDPR lays out the basic premise that individuals should have control over their own data and places new restrictions on financial institutions and other organizations seeking to store, process or transm...

Multiple python versions on Mac

October 30, 2020

I had MacOS Mojave. It needed multiple versions of pythons esp. I needed a version of Python between 2.6.x and 3.0.x. I came across this nice article https://medium.com/faun/pyenv-multi-version-python-development-on-mac-578736fb91aa I followed all steps but when I tried to install 2.7.0 and 2.7.1 I ran into this issue. ERROR : The Python ssl extension was not compiled. Missing the OpenSSL lib? I had openssl installed but it seems it was not the version it was looking for. Eventually I was able to install python version 2.7.15. Looks like 2.7.15 was using the openssl lib which I had it installed hence no issues. bash-3.2$ pyenv install 2.7.15 python-build: use openssl@1.1 from homebrew python-build: use readline from homebrew Downloading Python-2.7.15.tar.xz... -> https://www.python.org/ftp/python/2.7.15/Python-2.7.15.tar.xz Installing Python-2.7.15... python-build: use readline from homebrew python-build: use zlib from xcode sdk Installed Python-2.7.15 t...

REST API Calls from JavaScript

October 26, 2020

https://www.freecodecamp.org/news/here-is-the-most-popular-ways-to-make-an-http-request-in-javascript-954ce8c95aaa/

Microsoft Active Directory

October 22, 2020

I followed up following blog post for installing active directory in my own AWS account. https://www.ecloudture.com/en/use-ec2-to-build-windows-active-directory-2/ I ran into one issue. When I tried to add machine(PC01 in the post ) into domain(ADLAB.com) I got an error. When I looked into AD Server Manager. on AD DS Service , DFS replication was failing with following error: Additional Information: Error: 1355 (The specified domain either does not exist or could not be contacted.) DNS Service was also showing a error/warning complaining that it was waiting for some signal from AD DS. After some research I came across this post. https://community.spiceworks.com/topic/1726627-the-specified-domain-either-does-not-exist-or-could-not-be-contacted One of the answers on this post, recommends following procedure on this. https://support.microsoft.com/en-us/kb/947022 I did follow the procedure and after that I was able to add the machine to the do...

Hashing

October 16, 2020

Interesting article on Hash Functions Consistent Hashing is a strategy used in system design esp. in scaling the backend data stores. The typical hash based sharding used to horizontally scale the databases doesn't handle the adding/removing of shards in efficient manner. Consistent hash sharding takes care of that. Here is a nice article comparing different sharding strategies.

Angular

October 09, 2020

Angular Recently came across this framework. It lets you do lots of cool things, which are difficult otherwise. 1. Getting javascript object values written to DOM or html field names. <label ng-repeat="obj in objCollection" class="radio-inline"> <input name="obj" id="obj-{{$index}}" value="{{obj.value}}" ng-model="scopedObj.val" type="radio"> {{obj.label}} <label> objCollection is the collection of obj objects. objCollection and scopedObj are both in the scope. 2. If there an object available in scope , you can directly access it using a URL. 3. similar to jQuery , you can make ajax calls easily. var uri = "http://mydomain/uri"; var $http = angular.element('html').injector().get('$http'); $http.get(uri, { ...

Okta

October 06, 2020

Okta is Cloud based Identity and Access Management platform. Below are some notes from a whitepaper published on Okta website. Okta can be used as foundation for implementing Zero Trust within organizations. Zero Trust is a security framework developed by Forrester Research. Zero Trust => never trust, always verify Zero Trust framework evolved later into ZTX ( Zero Trust Extended Ecosystem ) . The focus has shifted from network perimeter to access ( who is accessing the system ). The focus has shifted to Identity and Access Control. There are four stages of Zero Trust implementation: Fragmented Identity Unified IAM Contextual Access : Context based access policies Adaptive Workforce : Risk based access policies, continuous and adaptive authn and authz Services Offered By OKta Okta Universal Directory Okta SSO Okta Advanced Server Access Okta Adaptive MFA Okta also integrates with number of vendors in security ecosystem. That...

Azure Functions

October 05, 2020

Azure Functions is equivalent to AWS Lambda. Azure Functions can be triggered by HTTP and also respond to HTTP request as of now but it does not have input binding for HTTP. It cannot take input from HTTP

Google Cloud

September 28, 2020

Scopes - Global(Network), Region(static external IP), Zone(disks, vms ) Project ( id , name, number ) - Any GC resource must belong to a project. A id is unique & can never be reused. can be seen as workspace. click here to see Comparison of services among various cloud providers Cloud SQL is similar to AWS RDS. Cloud Storage is similar to S3. it has multiple tiers and support Object Lifecycle Management to transition from one storage class to other based on certain criteria. Workflow is similar to AWS Glue or Azure Data Factory. BigQuery compares to Redshift in AWS Pub/Sub + Dataflow compares to Kinesis in AWS and Azure Event Hub in Azure.

Enterprise Integration Patterns

September 23, 2020

Enterprise Integration Patterns

Kerberos

Kerberos is an authentication protocol for trusted hosts on untrusted networks. The authentication among various parties happen as shown in the diagram. The diagram is based on the video presentation at following link: Kerberos Authentication Explained | A deep dive

S3 Transfer Accelerator

S3 transfer accelerator takes advantage of Amazon CloudFront’s globally distributed edge locations. This is used for transferring fast , easy , secure transfer of files over long distances between client and S3 bucket.

AWS DirectConnect

AWS DirectConnect is a service to connect your on-premise systems with AWS without going thru internet. You may need this specifically if you need high speed and/or low latency.

AWS GuardDuty

AWS GuardDuty is a thread detection service by continuously analyzing event log data. It can monitor VPC Flow logs, CloudTrail event logs, DNS logs and integrate with AWS CloudWatch events. It generates alerts.

AWS DataSync

September 21, 2020

AWS DataSync is an online data transfer service that simplifies, automates, and accelerates copying large amounts of data to and from AWS storage services over the internet or AWS Direct Connect . DataSync can copy data between Network File System (NFS), Server Message Block (SMB) file servers, self-managed object storage, or AWS Snowcone , and Amazon Simple Storage Service ( Amazon S3 ) buckets, Amazon Elastic File System ( Amazon EFS ) file systems, and Amazon FSx for Windows File Server file systems. D ataSync includes encryption and integrity validation to help make sure your data arrives securely, intact. DataSync does both full initial copies, and incremental transfers of changing data.

Elasticsearch

September 05, 2020

What is Elasticsearch ELK stack - Elasticsearch , Logstash, Kibana and Beats

AWS DynamoDB

September 05, 2020

Global & Local Secondary Indexes : when data access patterns can not be accommodated using primary keys.

AWS Big Data Specialty

Ingestion Tools Delivery: Guranteed oredering delivery by all AWS services except Firehose & SQS(Standard) Exactly Once delivery by only DynamoDB Streams and Amazon SQL(FIFO) and all others are at least once. AWS Lambda : Limited capabilities of buffering AWS EMR : Single Availability Zone AWS Redshift does not support resource based policies. DynamoDB provides fine-grained access to your tables and data

PowerBI Gateway

References: https://blog.pragmaticworks.com/power-bi-and-data-security-on-premises-data-gateway

AWS GLUE

AWS Glue - batch jobs, ETL, minimum 5 min intervals, no support for NoSQL stores, not suitable for heterogeneous processing use AWS Data Pipeline. Configurable DPUs (Data Processing Units) fully managed ( serverless) Scale out Apache Spark environment, pay-as-you-go, ETL service, discovers and profiles data via Glue Data Catalog, generates ETL code to transform data into target schema, can run the job to load data into destination, allows you to configure, orchestrate and monitor complex data flows. The AWS Glue Data Catalog is Apache Hive Metastore compatible and is a drop-in replacement for the Apache Hive Metastore for Big Data applications running on Amazon EMR. I n the context of updating the metadata, whatever you can do with a Hive DB, you can also do with AWS Glue Data Catalog. AWS Glue = Data Catalog + Flexible Scheduler For supported data sources see AWS Glue FAQ AWS Glue can also be used for complex ETL of streaming data. ...

AWS DMS