- Cloudera distribution of Apache Hadoop ( CDH ): It’s the first commercial Hadoop Startup. offers core open distribution along with a no. of frameworks which include Cloud era search, Impala, Cloudera Navigator and Cloudera Manager.
- Pivotal HD : includes a number of Pivotal software products such as HAWQ (SQL engine) GemFire, XD (analytics), Big Data extensions and USS storage abstraction. Pivotal supports building one physical platform to support multiple virtual clusters as well as PaaS using Hadoop and RabbitMQ.
- IBM Infosphere BigInsghts : includes visualization and exploration, advanced analytics, security and administration. There is no other vendor which can give you the flexibility of working on a Bare Metal machine. But that comes at the price of scalability. Bare Metal machine can’t be scale up or down on the fly. IBM’s other products BigQuality, Bigintegrate, and IBM InfoSphere Big Match can be seamlessly integrated for a mature enterprise operations.
- Amazon Elastic MapRedue: comes with EMRFS which allows EMR to be connected with S3 and use it as a storage layer. The fact that S3 is the market leader in object storage and many enterprises are already using S3 for their Big Data storage, makes it an obvious choice.
But AWS EMR work with AWS data stores only and I really doubt if it can be integrated with other storage options.
- Azure HD Insight : Azure HD Insight uses HDP (Hortondataworks Platform) distribution which is designed for Azure Cloud. Enterprise Architects can use C#, JAVA and .NET to create configure, monitor and submit Hadoop jobs.
- Google Cloud Dataproc: has built in integration with Google Cloud Services like BigQuery and Big Table along with Dataproc. Unlike other vendors Google bills you in minutes.
You know what is common between TCP/IP and Hadoop ? They both were created not keeping security in mind. And you know what is the other common thing between them, they both have become extremely important and ubiquitous entities.
The very fact which we rejoice about, of having massive amount of data created by various datasources like sensors, mobile devices etc have given hackers multiple point of entries in to an organization. Think about it it’s not just servers exposed to the internet which can be hacked, anything and everything which connects to your intranet has potential source of security breach.
Hadoop characteristics such as distributed computing, Fragmented data, access to data and node to node communication presents a great challenge for the developers to prevent any security breach. The biggest issue with Hadoop security is that it’s not a single technology, but it’s an entire ecosystem of technology, Hive, HBase Oozie etc.
it’s important to understand the threat categories before creating a security strategy which can be :
- Unauthorized access/Masquerade
- Insider Threat
- Denial of Service
- Threats to Data
According to Forrester, developer must consider 6 security properties:
- Confidentiality – make data only available to people who really need it
- Integrity – Data changed in appropriate way and the way it’s authorized to change
- Availability : Data is available only from applications which are allowed to make them available
- Authentication : A person’s identity is established before access is granted.
- Authorization : People are explicitly allow or not allowed to access the application
- Nonrepudiation : Person cannot perform an action and later denied performing the action
In addition it’s important to understand that Hadoop architecture which comprises of
There is a lot which also depend on operating environment of Hadoop. It can one of the following
As I have highlighted earlier deploying Hadoop as a Service take care of most of these concerns and then you really focus on actionable insights and making great apps for your business.
If you want to consult more about Hadoop and other Big Data solutions on Cloud, just get back to mere here
IoT is the new favorite of CIOs. A lot of buzz is there in the market. So it’s really important to understand the tangible benefits of IoT.
According to Forrester, IoT impact businesses in two ways :
1. Enhance front end customer experience: Every industry has different definitions and requirements to enhance the customer experience and it’s very difficult to list down each and every aspect of customer experience. Broadly, following are the initiatives should be taken to enhance customer experience:
- Build Smarter Products by tapping the enormous data
- Use the data to make Customer order and delivery tracking more efficient and precise
- Enhance Energy management by deploying intelligent sensors
- Enhance Security and public safety monitoring or surveillance by making things talk to each other and managed centrally
- Connected sensors which can be controlled through handheld devices or from a remote location making Smarter homes.
2. Improve backend operational efficiency: A business is as efficient as its backend is. Processes like navigation, metering, asset tracking, notifications, monitoring and ordering support the following use cases:
- Fleet Management: Navigation systems and asset tracking systems have made it possible for reduce the cost and use the same money to invest in core business processes
- Logistics and Transport: It’s become easy to monitor non stationary assets’ inventory with the help of chip in the near real time. A report published by Forrester highlights how Cargo View uses smart global SIM cards to provide an automatic airplane mode during flight times so that connected objects travel in a safe FAA compliant mode when on a aircraft.
- Predictive and Prescriptive maintenance: Real time monitoring services enables Predictive and Prescriptive maintenance. Now it’s possible to predict failure or adapt precautionary methods so that failure can be avoided or aftermaths can be minimized.
- Supply chain management: Monitoring and tracking reliably the status of fast moving consumer goods and assets is a competitive advantage. So if tomorrow don’t get surprised when you milk man carry a bag with a chip embedded in it and capturing the location data, temperature data etc and transmitting it to the server to make the milk delivery in a next level.
- Safety monitoring and surveillance: Location specific data adds tremendous relevancy to IoT solutions. Imagine you got a car which transmits location continuously to your mobile device. And in case you come across an familiar route you disable the car.
IoT solutions are getting smarter and more attractive not because of two important facts:
1. Machines can now talk to each other
2. There is a program sitting somewhere which make the hardware, users and other machines understand what is being said, how is it being said, when is it being said, by whom and to whom it’s being said.
To get you started, all you need is an IoT platform which already has most of the stuffs ready for you, you just have to start writing the codes. Yes I am talking about IoT platforms on Cloud !!!
Starting a Big Data and Analytics (BDA) project on cloud is not only faster, but also quite cheaper. I have come across many clients who want to start their BDA project, but they don’t proceed or I will say they delay thinking about the upfront cost, required skills and execution time. All these can be well taken care of if they start their project on cloud.
The challenge for them is to decide upon which what cloud services they should go for and which vendor to select. Here is a list of the services which are currently available on cloud and will be a good idea to consider them to start BDA project.
1. Edge Services : This act as an interface between your users, data and Cloud services provider. It serves the following purposes
- DNS resolution
- CDN services
- Load balancers
This is typically available as IaaS from companies like, IBM, AWS, Microsoft etc.
2. Data Streaming : This is primarily for data in motion. You need data streaming for
- Real time analytical processing
- Data Augmentation
Data Streaming tools are available as SaaS on various cloud market places.
3. Data integration: Data from different sources are delivered to the cloud service provider by using Edge services then go through the following to extract insights from it :
- Data Staging
- Data quality checks
- Transformation and loading
These are also available was SaaS on various Cloud Market places.
4. Data Repositories: Data repository consist of both data in motion ( from streaming services) and data at rest (after Data integration process) and then prepares the data for the various Analytical engines. Data repositories are meant for the following functionalities:
- Data warehousing
- Landing, exploration and archive
- Deep Analytics and modelling
- Interactive Analytics and Reporting
Earlier not many SaaS offerings were available for Data repository services. But now there are many SaaS offerings available on different cloud market places.
5. Actionable Insights: Data from Data repository is then fetched in to a variety of tools to extract insights. Typically you need different tools to perform the following:
- Decision Management
- Discovery and Exploration
- Predictive Analytics
- Analysis and Reporting
- Content Analtyics
- Planning and Forecasting
In addition to the above services you do get Data Security an Governance services on cloud. There are multiple vendors providing either all or part of the above services. The market is flooded with services and to select one service or vendor really require a lot of research and many points to be considered.