How to Create a Data Frame in Pandas.

DataFrame is one of the most important data structures in Pandas when it comes to Data Analysis. Let's learn how to create a DataFrame.

How to Create a Data Frame in Pandas.

Hello Learners,

Welcome to Data Analysis in Python Pandas, Data Frame is one of the most important data structures of Pandas. Most of the time we have to work with Dataframes while doing Data analysis. So, let's learn how to create Dataframe in Pandas.

In the last article, we learn to create a Series in Python Pandas. If you did not read that yet, you can read it here since it's going to be a continuation of that.

So, Lets' Get Started.

What are Dataframes?

You can think of Dataframe as tables in SQL, they are very similar. The Dataframe has a tabular data structure that stored data in rows and columns. DataFrame is a 2-Dimensional data structure that holds heterogeneous data with labeled axes.

If you remember in Series as well we have an index array containing labels associated with each element. Similarly, DataFrame has two labeled axes one for rows and one for columns.

rows

columns

column 1column 2
011
122

How to create Dataframe?

In Data analysis, Pandas DataFrame will be created by a loading SQL database, CSV file, or Excel file. But, we can create a DataFrame from scratch using list, dictionary, list of the dictionaries, or even using Series, etc. There are various ways to create a DataFrame.

The Syntax for creating DataFrame is

pandas.DataFrame(data, index, columns, dtype)

here

  • data can be

    • List, list of lists, or list of dictionaries etc.
    • Dictionary of a list, ndarray, series, or dictionary.
  • index: you can optionally pass the index array (row labels) to the data frame. If not passed then the default index will be assigned (0, 1, .. len(data)-1) as index labels.

  • columns : you can pass columns array(columns labels) to the data frame. If you did not pass the column labels then the default values such as (0, 1, ..., n-1) where n is a number of columns, will be assigned.

  • dtype: To specify the data type of particular columns.

Now, that we have an idea about what a data frame is, Let's look at how we can create a DataFrame using different data inputs.

Dictionary as Dataframe

The most common way to create a Dataframe is to pass a dictionary as data input to the pandas DataFrame() constructor.

The dictionary contains a key for each column that you want to define, with a list of values for each of them.

When we use a dictionary of lists as data, then all lists must be of the same length. If an index is passed, it must also be the same length as the arrays/list. If no index is passed, the index will be range(n), where n is the array length.

Let's start with importing NumPy and Pandas libraries.

import numpy as np
import pandas as pd
# Creating dataframe using dictionary of lists.
d = {'one':[1, 2, 3, 4], 'two':[2.0, 3.2, 4.5, 5.5]}

df = pd.DataFrame(d)
df

Output

    one     two
0     1     2.0
1     2     3.2
2     3     4.5
3     4     5.5

Did you notice the index? we did not specify the index while creating the Dataframe so the default index with values of 0, 1, 2, and 3 is assigned since our length of data/ length of list values in the dictionary is 4. The keys of the dictionary become the columns names and values is data points.

But, if you want to assign labels to the indexes of a dataframe, you have to use the index option.

# creating DataFrame with the specified index.
d = {'one':[1, 2, 3, 4], 'two':[2.0, 3.2, 4.5, 5.5]}
index = ['a', 'b', 'c', 'd']

df = pd.DataFrame(d, index=index)
df

Output

    one     two
a     1     2.0
b     2     3.2
c     3     4.5
d     4     5.5

If you want to select a particular column from a dictionary from which you want to create a data frame, you can select columns using the columns option in the DataFrame constructor by specifying a sequence of columns.

Let's select column one only from the dictionary we define above.

df = pd.DataFrame(d, columns = ['one'])
df

Output

    one
a    1
b    2
c    3
d    4

ndarray as Dataframe

Now that, we know about how to use the index and columns attributes of the DataFrame constructor, we can easily define a DataFrame.

Instead of using Dictionary, we can define three arguments in the constructor, data with ndarray values, an array containing labels assigned to the index option, and an array for names of the columns assigned to the columns.

# Create DataFrame using ndarray.
arr = np.arange(16).reshape((4, 4))
index = ['a', 'b', 'c', 'd']
columns = ['one', 'two', 'three', 'four']

df = pd.DataFrame(data = arr, index=index, columns = columns)
df

Output

   one  two  three  four
a    0    1      2     3
b    4    5      6     7
c    8    9     10    11
d   12   13     14    15

List of lists as Dataframe

Similarly, we can create a Data frame using a List and a list of lists.

# Dataframe from a simple list.
df = pd.DataFrame([1, 2, 3, 4], index=['a', 'b', 'c', 'd'], columns=['one'])
print(df)

Output

   one
a    1
b    2
c    3
d    4
# Create DataFrame using lists of list.
arr = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
index = ['a', 'b', 'c']
columns = ['one', 'two', 'three']

df = pd.DataFrame(data = arr, index=index, columns=columns)
print(df)

Output

   one  two  three
a    1    2      3
b    4    5      6
c    7    8      9

As you can see in the above code example, the first list becomes first column, second list as second column and so on.

Series as Dataframe

We can create a Dataframe from Series. The resulting index will be the union of the indexes of the various Series.

# Series
a = pd.Series([1, 2, 3], index = ['a', 'b', 'c'], dtype=float)
b = pd.Series([11, 22, 33, 44], index = ['a', 'b', 'c', 'd'])

# DataFrame using series `a` and `b`.
df = pd.DataFrame({'first':a, 'second':b})
df

Output

   first  second
a    1.0      11
b    2.0      22
c    3.0      33
d    NaN      44

Here, the index of the new Dataframe considers all index values / Union of the index of all series. Since the d index label is not present in Series a, it is assigned with the NaN value which is nothing but missing data.

There are various ways to create a Dataframe in Pandas, and we learned some of the most common ways to create them.

Thank you for reading. See you in my next article.

Did you find this article valuable?

Support Madhuri's blog by becoming a sponsor. Any amount is appreciated!