Box Plot is a tool in **descriptive statistics** that can help you to get a clearer picture of the **distribution** and the most important values or **outliers** in your dataset at the same time.

This method was first introduced for a wider audience in 1969 by John W. Tukey, who used **boxes** and so-called **whiskers** to visualize distribution and variability in data.

Box Plots also allow the quick **investigation** of one or more datasets. They may seem simple at first sight, but the **space** they take compared to a **histogram** and the possibility to compare several **groups of data** makes it a well-utilizable tool during the analytical process.

## What Box Plot Tells

Box Plot is a graphic method to depict the **five number summary** of your given dataset.

**The minimum**is the lowest point of your data, excluding any possible outliers.- The first quartile or
**lower quartile (Q1)**is the median of the**lower half**of your dataset. **Median**is the middle value of your dataset. (Do not mix it with**mean**.)- The third quartile or
**upper quartile****(Q3)**is the median of the**upper half**of your dataset. **Maximum**is the largest point of your data, excluding any possible outliers.

The **interquartile range** or IQR that represents the distance between the upper and lower quartiles is pictured as a **box**.

## How Box Plot Work

When a box plot is created, the **box is drawn** between the first and the third quartile, and a **horizontal line** within the box pictures the median. If there are any **outliers** in data, they are shown as circle, star, or X sing.

**Whiskers** can be also applied to connect the minimum and maximum values to the box-shaped part of the graph. Hence the name **box and whisker plot**, as Tukey originally used whiskers to indicate even the **symmetry** and **normality** in distribution with them.

Box plots divide your data into **four sections** that contain approximately 25 percent of the data in that set. Under **normal distribution**, this will be exactly 25% and symmetric, but in most cases the ratio and the size of the quartiles are different.

The **spread of data** can be easily detected by looking at the minimum or maximum values at the end of the whiskers.

**Skewness** can be also measured by using box plot. If you look on the chart and see the difference in the placement of the median within the box, while the whiskers are shorter or longer in the direction of extreme values, it shows positive or negative skew.

## How Box Plot Identifies Outliers

An **outlier** is a particular data that is numerically far from the rest of the set.

Defining outliers in data with graphical methods is possible, but the outcome is not always easily interpretable. If you are using, for example a Gaussian curve, you will not be able to see the **extremity** of these values.

However, box plots are created to **visualize** the outliers. This can be seen in the form of small marks outside of the whiskers.

In this **example**, outliers can be found outside 1.5 times the interquartile range below the lower quartile and above the upper quartile (Q1 - 1.5 * IQR or Q3 + 1.5 * IQR).

## Box Plot in Statistics

Since its first appearance, box blot is favored by researchers due to the insights they get from it and the space it spares. Different **variations** of box plot evolved throughout the years that indicate more and help to recognize the differences between datasets. The two most common variations are **variable width** and **notched box plot**.

**Variable width box plot** pictures the size of each plotted data group by making the **width** of the box proportional to the **square root** of the size of the group.

**Notched box plot** is narrower around the median. The **width** of the notch is proportional to the **IQR** of the sample and **inversely** proportional to the **square root** of the size of the sample.

Notched boxes can also indicate the statistically significant difference between the medians. If two notches do not overlap, that is a clear sign of it.

Now, let us have a closer look at the creation of box plots by using the data visualisation platform **AnswerMiner**.

## How to Create Box Plot with AnswerMiner

In order to create a box plot, you have to open your **dataset** in AnswerMiner first. If you donâ€™t have one, use our **sample** sets.

Pick your variables, using the **Suggested Charts** feature so based on that data AnswerMiner smart algorithm will make the box plot visualization. The other way is to create a box plot straight by clicking on the **Create Chart** button on the left sidebar.

After clicking on the **Box Plot icon**, you can set up your chart. In this example, we are using the **Sales OrderTable** sample dataset to visualize the differences between countries.

Choose the **Sales_value** as Size and **Country** as Stack values.

The visualization is almost complete, only the color is missing. In this case, we use the median of **Price ofeach** for colorization. The results are as follows.

*Sales Value by Country Colored by Price of Each Product*

## Summary

Box plot, or box and whisker plot is an important **graphic visualization method** to oversee and compare multiple data groups at the same time. It can also be used to detect **outliers** in data at an early stage.

**Spread and skewness** of data can be shown by the different placement and size of the box and of the whiskers.

Most importantly, box plots describe and **summarize** the five core values of the given set of data in an easy to understand way.