Indexing Methods of Pandas DataFrame (Part 1)

Hello,

Welcome to Data Analysis in Python Pandas series, In this post, we'll discover different ways to access elements, rows, and columns from a DataFrame. Indexing is one of the most useful and powerful features of Pandas which we can use to access our data from DataFrame.

Accessing elements from DataFrame is very similar to the Python lists and NumPy arrays. The Python and NumPy indexing operator [] and attribute operator . provide a quick and easy way to access Pandas data structure. But it is recommended to use the Pandas data access methods.

Let's start with importing the pandas library and loading the dataset. If you need to know how to read different data file types in Pandas you can read Reading and Writing files in Pandas post.

import pandas as pd
from sklearn.datasets import load_iris

data = load_iris()
data = pd.DataFrame(data=data['data'], columns=data['feature_names'])

# print first five rows.
data.head()

Output

   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
0                5.1               3.5                1.4               0.2
1                4.9               3.0                1.4               0.2
2                4.7               3.2                1.3               0.2
3                4.6               3.1                1.5               0.2
4                5.0               3.6                1.4               0.2

Accessing Columns From DataFrame.

As we know that, Pandas data structure has axis labels. DataFrame has two axes labels one for rows and one for columns. The labels for columns are column names, Our column labels are sepal length (cm), sepal width (cm), petal length (cm), and petal width (cm). By default rows labels are numbers. Using these labels we can access the columns and rows of DataFrame.

Single column selection

To select a single column from the dataframe using the basic indexing [] operator. We can access a particular column by passing its name to the square bracket as shown in syntax.

Syntax

DataFrame[column name]

# selecting first column with column name.
data['sepal length (cm)']

Output

0      5.1
1      4.9
2      4.7
3      4.6
4      5.0
      ... 
145    6.7
146    6.3
147    6.5
148    6.2
149    5.9
Name: sepal length (cm), Length: 150, dtype: float64

You can select the single column with the . operator only with non-numeric columns with no space contain. Meaning if the column name is a number for example 0 or 1 or any other number then you can not use this dot operator.

Multiple column selection

You can select multiple columns by passing a list of column names to the [] operator.

Syntax:

DataFrame[['col_1', 'col_2', 'col_3', ..., 'col_n']]

# Multiple column selection.
data[['sepal length (cm)','petal length (cm)']]

     sepal length (cm)  petal length (cm)
0                  5.1                1.4
1                  4.9                1.4
2                  4.7                1.3
3                  4.6                1.5
4                  5.0                1.4
..                 ...                ...
145                6.7                5.2
146                6.3                5.0
147                6.5                5.2
148                6.2                5.4
149                5.9                5.1

[150 rows x 2 columns]

Accessing Rows of DataFrame

Selecting rows from series and dataframe is exactly the same as with an ndarray and Python list's using slicing. It returns a slice of values and corresponding labels.

Let's start with selecting data from Series.

# Importing numpy library.
import numpy as np
# Create series.
arr = np.random.random((5))
s = pd.Series(arr)
s

Output

0    0.063901
1    0.751820
2    0.347128
3    0.925637
4    0.495227
dtype: float64

Select rows using the [] operator with row labels. here row labels are default index values from 0 to len(data)-1.

syntax:

For series: Series[index_label_value]

# selecting value at index label 2  from series.
>>> s[2]
0.3471284869083604

# using slicing 
>>> s[:] 
0    0.063901
1    0.751820
2    0.347128
3    0.925637
4    0.495227
dtype: float64

>>> s[1:3]
1    0.751820
2    0.347128
dtype: float64

>>> s[::2]
0    0.063901
2    0.347128
4    0.495227
dtype: float64

You can not select a single row from the data frame using [] operator.

# selecting rows from dataframe using slicing operator.
>>> data[1:3]
   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
1                4.9               3.0                1.4               0.2
2                4.7               3.2                1.3               0.2

>>> data[2:10:2]
   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
2                4.7               3.2                1.3               0.2
4                5.0               3.6                1.4               0.2
6                4.6               3.4                1.4               0.3
8                4.4               2.9                1.4               0.2

This is a basic indexing method to access rows and columns of DataFrame. Pandas provide a more efficient way to access rows and columns of DataFrame with labeled-based .loc and index-based .iloc methods. We'll learn about these methods in detail in the next Tutorial.

This is all about today's article. Thank you for reading. I hope this helps you.

See you in my next article. :)

Indexing Methods of Pandas DataFrame (Part 1)

Basic indexing to access Data from Pandas Data structure

Accessing Columns From DataFrame.

Single column selection

Multiple column selection

Accessing Rows of DataFrame

Did you find this article valuable?