Posts

Showing posts from May, 2019

Apache NiFi

Apache NiFi is open source data flow tool. It has web user interface where you can build data flow between disparate systems. The flow can be versioned and exported. NiFi 's design closely align with  flow based programming. It comes with pre-built processors and connectors which can connect with multiple systems.  NiFi cluster is based on Zero-Master Cluster paradigm.  Apache Zookeeper elects one node as Primary and one node as Coordinator.    Each node in cluster performs same task but on a different set of data.

Kinesis vs Kafka

Kafka - you will have manage the clusters while Kinesis is self managed service. Kafka more flexible and functionally rich. Reference:  http://cloudurable.com/blog/kinesis-vs-kafka/index.html

Time series databases

Nice article on time series databases: https://medium.com/schkn/4-best-time-series-databases-to-watch-in-2019-ef1e89a72377

MiNiFi

Apache NiFi/MiNiFi - A subproject to collect data where it originates(source).

Could Services Comparison

https://www.cloudhealthtech.com/blog/cloud-comparison-guide-glossary-aws-azure-gcp https://www.business.com/articles/azure-vs-aws-cloud-comparison/

AWS S3

Use multipart upload to upload objects with size 5MB-5TB  to S3 bucket. Unless you abort or complete the multipart upload, you will be charged for the parts storage and any requests relating to those parts. You can also configure bucket  lifecycle policy to abort the multipart upload. S3 is a global namespace and S3  bucket URL has to be globally unique. if S3 uptime is below the SLA during any period(monthly) , AWS provides service credits. Static Web Site Hosting  The bucket name has to be same as "Name" value in the "A" Record Set element ( inside the Route 53 Hosted Zone )  whose Alias Target is the s3 endpoint. One thing to note is that Alias Target points to  region specific s3 endpoint and not bucket specific ( s3-website-us-east-1.amazonaws.com. )  but the specific bucket must exist and should have public access at least for viewing (reading) . It should say "Public"  under the Access column for that bucket. If that is not the case , yo

Terdata to snowflake

https://copycoding.com/d/teradata-to-snowflake-migration-guide

Snowflake

Zero copy data cloning Time Travel the data ( old , modified data retained for  a configurable time ) Separation of storage, metadata & compute - all three Snowpipe to load data from S3 to Snowflake rather than virtual warehouse for this transfer.