AWS, BIG DATA, JAVA

Posts

Showing posts from March, 2019

JVM Memory Model

March 26, 2019

Java Memory Model Heap : Used for allocating new objects and Java Runtime classes. Minor GC : When Eden Space ( in Young generation ) is filled, garbage collection happens, it deletes the objects not referenced, it moves the longer living objects to Surviving spaces and even longer aged objects( beyond a threshold) to Old Generation. All the Garbage Collection events are "Stop The World" events, means all the threads stop when GC thread runs. Minor GCs are fast and not noticebale but major GC events can be clearly noticeable and even cause apps to unresponsive during that time. When JVM falls short of stack space, it throws StackOverFlowError and when it runs out of heap space, it throws OutOfMemoryError. Interned Strings are kept in a Hashtable in Permanent Generation. Interned strings are the string literals appearing explicitly in code or created by explicitly calling intern() method on String object. ("j"+"v"+"m").intern...

Kafka

March 23, 2019

Kafka is streaming ( can also be used as messaging, message storage) platform which provide best of both traditional messaging models: message queues and publish-subscribe. At the same time it serves as durable distributed storage and provides in order guarantee of messages and above all it is scalable. Kafka APIs : Producer, Consumer , Stream ( Allow creating data processing pipelines ), Connector. For more details see Kafka Introduction Kafka clusters can span multiple data centers ( durable, distributed storage ) . Consider each topic like a folder in the file-system and each event as a file. Because each Kafka topic ( stream of records ) is stored in partitioned logs ( number is configurable) for configurable retention period, multiple clients can read those in the ways they want. Hence Kafka topics are multi-subscriber. Also there is very less overhead per consumer. The only metadata needed to be kept is offset ( in the partition ). T...

Cassandra

March 21, 2019

Apache Cassandra is an highly available ( fault tolerant) , linearly scalable ( allows adding nodes with limited overhead , uses consistent hashing ) , high performance ( because it can be scalable ) , NoSQL ( key/value based ) , distributed database. It is designed on Dynamo principles (incremental scalability - , symmetry - every node is same, decentralization - peer to peer interaction , Heterogeneity - allow heterogeneous infrastructure) and follows "column family" ( Google BigTable ) data modeling. It uses gossip protocol. In cassandra data is partitioned by a multi-column primary key that gets a hash value and is sent to the node whose token is numerically bigger than the hash value.

Deep Learning

March 20, 2019

AI is superset of Machine Learning. Deep Learning is a kind of machine learning where no feature knowledge is fed into the process. It needs huge amount of data for training and accuracy increases with the data. Deep learning is based on deep ( more layers) neural networks. Deep learning is a kind of machine learning ( subset ).

Design Patterns

March 20, 2019

An anti-pattern is a common response to a recurring problem that is usually ineffective and risks being highly counterproductive ( wikipedia). Strategy : This pattern allows selecting an algorithm/concrete implementation at runtime. Decouples implementation/algorithm from the context/client. Normally Strategy pattern is used in combination with Factory patten. Initialization On Demand Holder : This pattern takes care of thread-safety issues in singleton. In Singleton, unless we synchronize the part where you create the instance, there is still a chance that multiple instance might get created. See the example code in Reference below Builder: This pattern is used for complex objects creations. This decouples the complex object creation process from the object representation. Chain of Responsibility ( COR ) or Chain of Command: This pattern allows breaking down a complex operation into a sequence of small steps, thus mak...

Data Structures

March 16, 2019

Binary Search Trees are trees where keys of the nodes follow a specific order. All the keys in the left of any node are always smaller and on right are always larger. For a balanced BST the upper bound on complexity of most ( insertion/deletion/search ) of the operations is O(log n ). If the BST is not balanced , it can be O(n). So balancing the BST is very important. Red-Black Trees and AVL trees are both balanced BSTs. They just follow different types of rules. AVL trees are more balanced than Red Black Trees but they may cause more rotations during insertion/deletion. So if there are more of those operations happening Red Black trees should be preferred over AVL. Matrix * Inverse Matrix = Identify Matrix ( A matrix which has everything else 0 except diagonal values set to 1s). The transpose of a matrix is a new matrix whose rows are the columns of the original. Shortest Path from a source node to all nodes in...

Java Collections

March 13, 2019

Collections 1. Map is not really a Collection interface. It is not derived from Collection. 2. You can created one collection to another as for every collection there is constructor which takes Collection as an argument. It is called conversion constructor. 3. Every collection can be converted to an array of objects or array of a given type of objects. Object[] toArray(); T[] toArray(T[] a); 4. Collection interface extends Iterable. 5. if you want to remove the object from collection while traversing it , use iterator for the traversing and iterator.remove for removing the element. 6. Collections.singleton(e) is a static factory method that returns an immutable Set containing only the specified element. 7. When you add or remove, addAll, removeAll on the Collection, it returns true only if the Collection is modified as result of the operation. 8. HashSet is the best overall Set implementation. 9. Set : HashSet, LinkedHashSet, SortedSet 10. List ( ordered collection ) : Array...

Interview Questions - Technical

March 13, 2019

1. How to reverse an array In place O(N) : for() loop till middle of array and swap each element with element with same offset from end. Convert to ArrayList, Collections.reverse, toArray ArrayUtils.reverse ( Apache Library ) 2. Find all pairs which add up to k from an integer array. 3. Find the top 10 most popular words in a List of words. 4. How to share state between threads ? 5. How will you identify deadlock ? 6. What is busy spinning in multi-threading ? Actively waiting by a thread for some condition to be true. while ( !condition) { System.out.println("Condition still not true"); } In busy spinning , thread is not relinquishing control of CPU and other resources like it does when you call wait() or sleep(). The only positive side of this is , if the conditions becomes true quickly , it will save time as it does not have to acquire the resources or repopulate cache etc. again as it already has those. 7. What is the signific...

Interview Questions - Non Technical

March 12, 2019

1. why manhole covers are round. *so they dont fall in. *easy to manufacture. *easy to roll around. 2. 8 tennis balls exactly same look. 7 weigh same and 1 is heavier. You have one balance, find the heavier ball in 2 weighing. This is really a sorting question. You have to apply quick sort but with a twist. You have to make it even quicker. 3. You are captain of angry pirate ship. You have to divide the loot. If more than half disagree with you, you die. How will you share the loot that you get the good share and survive. (Divide and rule ) Make half of them happier, by the giving them just more than what they should expect. But here is the twist in the question, make sure those pirates do not believe in fairness and do not believe in equality and humanity. 4. Given a function which produces a random integer in the range 1 to 5, write a function which produces a random integer in the range 1 to 7. scale the random number generated. just like (7/5) * f() 5.

Akka

March 12, 2019

Akka - is a set of open source libraries which allows Actor model programming paradigm.

Actor Model

March 11, 2019

Actor model is a programming paradigm which models any software system as composed of actors which communicate with each other by sending messages (no reply hence non-blocking). Message is the only means of communication with external world for these actors. Messages are queued at each actor and are processed in sequence. This avoids the need for synchronization in object oriented world and avoids the need of costly locks and in turn deadlocks. In this manner Actor Model is supposed to meet the demands of high performance parallel processing systems of today.

Lambda Architecture vs. Lambda Expression vs. AWS Lambda

March 11, 2019

Lambda Architecture is a data processing architecture involving huge amount of data with goals of providing real time views without sacrificing accuracy. Typically when large amount of data is involved, the data processing takes time so you can not have real time views/reports. If you want real time view then you can take into account all the data so views are inaccurate. Lambda architecture tries to address it by combining real time processing on limited amount of data with batch processing on the entire data set. Lambda expression in Java 8 is a simplified expression of defining an anonymous class having single method whose type is inferred. The type of Lambda Expression( (args)->{body} ) is one of the FunctionalInterface. AWS Lambda is a AWS hosted service which allows you to execute code on certain events without worrying about infrastructure where it runs( serverless architecture).

Dynamo vs. DynamoDB

March 11, 2019

Dynamo is a set of principles/techniques for highly available distributed data storage. DynamoDB is an AWS hosted key/value based storage service which is designed based on Dynamo principles except that it does have a single leader-replication ( Dynamo recommends/suggests leaderless replication ).