AWS RedShift

Columnar storage, Parallel & distributed queries across multiple nodes, can use existing BI tools ( standard SQL  , JDBC, ODBC support) ,  scale-in/out, scale up/down, column level compression , automated admin activities like snapshots/backups etc.
--
Sample URLs

HOST :  lab.cwfgigx176xf.us-west-2.redshift.amazonaws.com
Endpoint : lab.cwfgigx176xf.us-west-2.redshift.amazonaws.com:5439/lab
JDBC :  jdbc:redshift://lab.cwfgigx176xf.us-west-2.redshift.amazonaws.com:5439/lab
ODBC : Driver={Amazon Redshift (x64)}; Server=lab.cwfgigx176xf.us-west-2.redshift.amazonaws.com; Database=lab
--
Does not use indexes for sequential scan like traditional db. 
--
To see disk capacity in cluster nodes: 

SELECT
  owner AS node,
  diskno,
  used,
  capacity,
  used/capacity::numeric * 100 as percent_used
FROM stv_partitions
WHERE host = node
ORDER BY 1, 2;

--
To see table usage ( blocks of 1MB each ) :

SELECT name, count(*) FROM stv_blocklist JOIN (SELECT DISTINCT name, id as tbl from stv_tbl_perm) USING (tbl) GROUP BY name;

--

pgweb- browser based PostgreSQL client. You can use pgweb for connecting to AWS Redshift. You can download it from https://github.com/sosedoff/pgweb/releases

--
The user '.....'  can not assume IAM role '........

You get this error when you try to execute a command with credentials which includes the arn for given IAM role and that iam role is not  attached to the cluster.  A IAM role which is attached to the cluster with appear "in-sync" status on "Manage IAM Roles" window which opens up when you click "see IAM roles" under cluster properties.
--
To keep Redshift costs low

*) Encode your columns to compress columns according to data types   
*) vacuum the tables to reclaim space for deleted data.

 "When you use Amazon Redshift Enhanced VPC Routing, Amazon Redshift forces all COPY and UNLOAD traffic between your cluster and your data repositories through your Amazon VPC. By using Enhanced VPC Routing, you can use standard VPC features, such as VPC security groupsnetwork access control lists (ACLs)VPC endpointsVPC endpoint policiesinternet gateways, and Domain Name System (DNS) servers, as described in the Amazon VPC User Guide. "   AWS documentation.

Redshift will not be able to access S3 VPC endpoint without enabling Enhanced VPC Routing.

https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/

--
 You can enable Cross-Region Snapshots for Redshift cluster.
--
For Redshift , you can enable encryption ( using SSE-KMS ) even after launching the cluster and it automatically encrypts the cluster after that. You can also encrypt Redshift cluster using the top level keys from HSM.
--
Scalability : You can easily scale Redshift cluster up or down. Resizing the cluster involves replacing the old cluster with new cluster. The old cluster goes into read-only mode until new cluster is up and running and then it replaces the old one. 
--

Comments

Popular posts from this blog

SQL

Analytics

HIVE