Knowledge base

,

What is an NoSQL Database?

A NoSQL database is a schema free and not relational database RDBMS) which provides a mechanism for distributed data storage and retrieval.

NoSQL, which encompasses a wide range of technologies and architectures, seeks to solve the scalability and big data performance issues that relational databases weren’t designed to address. NoSQL is especially useful when an enterprise needs to access and analyze massive amounts of unstructured data or data that’s stored remotely on multiple virtual servers in the cloud.

Contrary to misconceptions caused by its name, NoSQL does not prohibit structured query language (SQL). While it’s true that some NoSQL systems are entirely non-relational, others simply avoid selected relational functionality such as fixed table schemas and join operations. For example, instead of using tables, a NoSQL database might organize data into objects, key/value pairs or tuples.

, ,

What is business intelligence?

Business intelligence (BI), is an umbrella term that refers to a variety of software applications used to analyze an organization’s raw data. BI as a discipline is made up of several related activities, including data mining, online analytical processing, querying and reporting. BI is a technology-driven process for analyzing data and presenting actionable information to help corporate executives, business managers and other end users make more informed business decisions.

BI encompasses a variety of tools, applications and methodologies that enable organizations to collect data from internal systems and external sources, prepare it for analysis, develop and run queries against the data, and create reports, charts, dashboards and data visualizations to make the analytical results available to corporate decision makers as well as operational workers.
The potential benefits of business intelligence programs include accelerating and improving decision making; optimizing internal business processes; increasing operational efficiency; driving new revenues; and gaining competitive advantages over business rivals. BI systems can also help companies identify market trends and spot business problems that need to be addressed.

BI data can include historical information, real time and as well as new data gathered from source systems as it is generated, enabling BI analysis to support both strategic and tactical decision-making processes. Initially, BI tools were primarily used by data analysts and other IT professionals who ran analyses and produced reports with query results for business users. Increasingly, however, business executives and workers are using BI software themselves, thanks partly to the development of self-service BI and data discovery tools.

Business intelligence combines a broad set of data analysis applications, including ad hoc analysis and querying, enterprise reporting, online analytical processing (OLAP), mobile BI, real-time BI, operational BI, cloud and software as a service BI, open source BI, collaborative BI and location intelligence. BI technology also includes data visualization software for designing charts, reports, maps and other infographics, as well as tools for building BI dashboards and performance scorecards that display visualized data on business metrics and key performance indicators in an easy-to-grasp way. BI applications can be bought separately from different vendors or as part of a unified BI platform from a single vendor.

BI programs can also incorporate forms of advanced data analytics, such as data mining, data visualization, predictive analytics, text mining, GIS maps, statistical analysis and big data analytics. In many cases though, advanced analytics projects are conducted and managed by separate teams of data scientists, statisticians, predictive modelers and other skilled analytics professionals, while BI teams oversee more straightforward querying and analysis of business data.

Business intelligence data typically is stored in a data warehouse or smaller data marts that hold subsets of a company’s information. In addition, Hadoop, Nucleon grid like Map&Reduce systems are increasingly being used within BI architectures as repositories or landing pads for BI and analytics data, especially for unstructured data, log files, sensor data and other types of big data.
Before it’s used in BI applications, raw data from different source systems must be integrated, consolidated and cleansed using data integration and data quality tools to ensure that users are analyzing accurate and consistent information.

,

Connecting to a Remote Database Server through a Firewall and SSH

This article explains how to connect MySQL database system via SSH through a Firewall. A large percentage of databsae users are using SQL Server, MySQL, PostgreSQL and others on a web server hosted by an ISP. Most hosting providers block port 3306 (the MySQL server port) at the firewall, preventing outside access to MySQL. This is an important security practice and you should be very concerned if your ISP does not block port 3306. In this article I will demonstrate how to connect the Nucleon Database Manager, including MySQL Administrator, to a remote server using SSH port forwarding.

What Is SSH?

SSH stands for Secure SHell and is typically used as an encrypted version of telnet. SSH allows you to access a remote server’s shell without compromising security. In a telnet session all communications, including username and password, are transmitted in plain-text, allowing anyone with adequate resources to listen-in on your session and steal passwords and other information. Such sessions are also susceptible to session hijacking, where a malicious user takes over your session once you have authenticated. SSH serves to prevent such vulnerabilities.

OpenSSH, the tool included with most Linux variants, is described as follows in the OpenSSH FAQ at http://www.openssh.org/faq.html#1.1 :

“OpenSSH is a FREE version of the SSH suite of network connectivity tools that increasing numbers of people on the Internet are coming to rely on. Many users of telnet, rlogin, ftp, and other such programs
might not realize that their password is transmitted across the Internet unencrypted, but it is. OpenSSH encrypts all traffic (including passwords) to effectively eliminate eavesdropping, connection hijacking, and other network-level attacks.”

 

What is SSH Port Forwarding

When a mysql client communicates with the MySQL server, all communication (with the exception of the user password) is done in plain text. What this means is that if an unscrupulous individual gets between your client and the server, they can have full access to all information transmitted. In order to protect your information you need to encrypt communications between the MySQL server and the GUI client.

SSH can be used to encrypt communications between the client and server. This is known as SSH port forwarding or SSH tunneling. One benefit of SSH port forwarding is that we can connect to a MySQL server from behind a firewall when the MySQL server port is blocked.

diagram of tunnel

SSH will listen on a specified port on the client machine, encrypt the data it receives, and forward it to the remote SSH host on port 22 (the SSH protocol port). The remote SSH host will then decrypt the data and forward it to the MySQL server. The SSH host and the MySQL server do not have to be on separate machines, but separate SSH and MySQL servers are supported.

Requirements for SSH and MySQL

To perform port forwarding between a Nucleon Database Master and the MySQL server, you will need a SSH login account for port forwarding. This account needs to either be located on the server running MySQL, or on a machine that can be accessed remotely via SSH and which in turn has network access to the MySQL server.

For this article we will be using Putty, an Open Source SSH client application written by Simon Tatham and available at http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html . Linux users should have a command-line SSH client already installed.

Creating the SSH Tunnel

We will first need to configure a Putty session for port-forwarding. Our first step is to configure Putty to connect via the SSH protocol and specify the server address:

Configure SSH protocol and server address.

Once we have configured our host and protocol, we can move on to configuring the SSH tunnel. To create an SSH tunnel, we specify a destination host and port:

setting up the tunnel

In this example, we are specifying that port 3306 on our client machine should be forwarded to port 3306 on the remote server. You can also forward data from a local port number that is different than the remote port number. For example, on my development machine I keep a local copy of MySQL running on port 3306. On my production server I also run MySQL on port 3306. I can configure port forwarding with port 3306 as the local port, but all traffic on port 3306 will be intercepted and forwarded, making the local copy of MySQL unreachable. If I change the source port setting to 3307, I can access the remote server through port 3307 and the local server through port 3306.

When the remote SSH host is on a different machine than the MySQL server, replace 127.0.0.1 with the IP address of the MySQL server (relative to the SSH host).

Once we have added our port forwarding directives, we can then save the session to make it available for repeated use. In the session menu specify a saved session name and click save to add this to the list of saved sessions:

Saves Putty Session for SSH Port Forward to MySQL

Once you have created and saved your session, you can add a shortcut to your desktop to quickly access port forwarding. Right-click on your desktop and choose New> Shortcut. Configure the shortcut and assign the target as /path/to/putty/putty.exe -load sessionname. In the example above, with putty at C:\putty.exe and the profile saved under the name MySQLTunnel, you would assign the shortcut target to be:

C:\putty.exe -load mysqltunnel

To open the session, double-click the icon and provide a username and password when prompted. A Putty window will open and SSH port forwarding will be established after you successfully log in. When you are finished using the tunnel you can close the Putty window to end SSH port forwarding.

Using the SSH Tunnel

Once SSH port forwarding is established, open your client application (I will use the MySQL Query Browser in this example).

Database Master MySQL Login

Set the server host to 127.0.0.1, using port 3306 (unless you configured a different port for your tunnel to prevent conflicts with a local copy of MySQL).

When you connect, Putty will act as a proxy and the client will connect to the remote copy of MySQL through the ISP’s firewall. As an added benefit, these communications will be protected by the encryption capabilities of SSH, preventing third parties from eavesdropping on your MySQL session.

Conclusion

SSH port forwarding is a valuable tool for communicating with remote MySQL servers securely, especially when the remote server is protected by a firewall. While an SSH account on the remote server is required, many ISPs are willing to provide one. SSH port forwarding can be used to protect all MySQL client sessions, including Database Master, MySQL Administrator, mysqldump, etc.

,

What is business intelligence software?

Business intelligence (BI) refers to skills, technologies, applications and practices used to help a business acquire a better understanding of its commercial context. Business intelligence may also refer to the collected information itself.

BI technologies provide historical, current, and predictive views of business operations. Common functions of business intelligence technologies are reporting, OLAP, analytics, data mining, business performance management, benchmarks, text mining, and predictive analytics.

Business intelligence often aims to support better business decision-making. Thus a BI system can be called a decision support system (DSS).

BI is not only Reporting. BI platforms enable users to build applications that help organizations learn and understand their business. Gartner defines [1] a BI platform as a software platform that delivers the following 12 capabilities:

 

Integration

BI infrastructure — All tools in the platform should use the same security, metadata, administration, portal integration, object model and query engine, and should share the same look and feel.

Metadata management — This is arguably the most important of the 12 capabilities. Not only should all tools leverage the same metadata, but the offering should provide a robust way to search, capture, store, reuse and publish metadata objects such as dimensions, hierarchies, measures, performance metrics and report layout objects.

Development — The BI platform should provide a set of programmatic development tools — coupled with a software developer’s kit for creating BI applications — that can be integrated into a business process, and/or embedded in another application. The BI platform should also enable developers to build BI applications without coding by using wizard-like components for a graphical assembly process. The development environment should also support Web services in performing common tasks such as scheduling, delivering, administering and managing.

Workflow and collaboration — This capability enables BI users to share and discuss information via public folders and discussion threads. In addition, the BI application can assign and track events or tasks allotted to specific users, based on pre-defined business rules. Often, this capability is delivered by integrating with a separate portal or workflow tool.

 

Information Delivery

Reporting — Reporting provides the ability to create formatted and interactive reports with highly scalable distribution and scheduling capabilities. In addition, BI platform vendors should handle a wide array of reporting styles (for example, financial, operational and performance dashboards).

Dashboards — This subset of reporting includes the ability to publish formal, Web-based reports with intuitive displays of information, including dials, gauges and traffic lights. These displays indicate the state of the performance metric, compared with a goal or target value. Increasingly, dashboards are used to disseminate real-time data from operational applications.

Ad hoc query — This capability, also known as self-service reporting, enables users to ask their own questions of the data, without relying on IT to create a report. In particular, the tools must have a robust semantic layer to allow users to navigate available data sources. In addition, these tools should offer query governance and auditing capabilities to ensure that queries perform well.

Microsoft Office integration — In some cases, BI platforms are used as a middle tier to manage, secure and execute BI tasks, but Microsoft Office (particularly Excel) acts as the BI client. In these cases, it is vital that the BI vendor provides integration with Microsoft Office, including support for document formats, formulas, data “refresh” and pivot tables. Advanced integration includes cell locking and write-back.

 

Analysis

OLAP — This enables end users to analyze data with extremely fast query and calculation performance, enabling a style of analysis known as “slicing and dicing.” This capability could span a variety of storage architectures such as relational, multidimensional and in-memory.

Advanced visualization — This provides the ability to display numerous aspects of the data more efficiently by using interactive pictures and charts, instead of rows and columns. Over time, advanced visualization will go beyond just slicing and dicing data to include more process-driven BI projects, allowing all stakeholders to better understand the workflow through a visual representation.

Predictive modeling and data mining — This capability enables organizations to classify categorical variables and estimate continuous variables using advanced mathematical techniques.

Scorecards — These take the metrics displayed in a dashboard a step further by applying them to a strategy map that aligns key performance indicators to a strategic objective. Scorecard metrics should be linked to related reports and information in order to do further analysis. A scorecard implies the use of a performance management methodology such as Six Sigma or a balanced scorecard framework.

References:
[1] http://web.archive.org/web/20081216003746/http://mediaproducts.gartner.com/reprints/microsoft/vol7/article3/article3.html