Indexing using .iloc and .loc methods in Pandas.(Part 2)
This article explains, how to select data based on position and label indexer.
Hello,
This is the second part of indexing in Pandas. In the first part, we select our data using basic indexing which is similar to selecting elements from Python Lists or NumPy arrays. We also use the slicing operator for selecting rows and columns.
We can use this indexing operator []
or attribute selector .
methods for selecting particular columns. But we can not filter out our data based on particular conditions using basic indexing.
For that, pandas have two methods iloc and loc which provide the functionality to select rows and columns by position and by labels respectively. In this article, we will discuss different ways to use the iloc and loc methods to access or modify our data.
So, Let's get started...
What is iloc and loc?
iloc
iloc is an integer indexed position-based method where the index must be an integer type for both rows and columns. It can be any integer from 0 to
len
(data)-1.
syntax:
Series.iloc[indexer]
DataFrame.iloc[row_indexer, column indexer]
loc
loc is a labeled-based method where the index is the label (names) of rows and columns as input.
syntax:
Series.loc[indexer]
DataFrame.loc[row_indexer, column indexer]
Let's create a Series and DataFrame first.
import pandas as pd
import numpy as np
# Creating series.
s = pd.Series(np.random.random(4))
s
0 0.142783
1 0.538737
2 0.177244
3 0.244888
dtype: float64
# Creating dataframe
df = pd.DataFrame(np.random.random((4, 4)), index=['a', 'b', 'c', 'd'], columns=['First', 'Second', 'Third', 'Four'])
df
First Second Third Four
a 0.897843 0.429562 0.224036 0.594005
b 0.910304 0.204563 0.788041 0.531814
c 0.478501 0.311965 0.682055 0.129311
d 0.632599 0.079228 0.882911 0.456732
.iloc method
Pandas provide a purely integer-based indexing method. These are 0
based indexing. When slicing, starting index is included while the stoping index is excluded. IndexError
will raise if an index is not valid.
# Selecting data from series
# single value.
>>> s.iloc[2]
0.177244
# Multiple rows using a list of values
>>> s.iloc[[1, 3, 0]]
1 0.538737
3 0.244888
0 0.142783
dtype: float64
# slice
>>> s[:3]
0 0.142783
1 0.538737
2 0.177244
dtype: float64
We can modify the values of series using iloc method as well.
>>> s.iloc[:2] = 0
>>> s
0 0.000000
1 0.000000
2 0.177244
3 0.244888
dtype: float64
# Select rows and columns from DataFrame.
# slice
>>> df.iloc[:2, :2]
First Second
a 0.897843 0.429562
b 0.910304 0.204563
# list of values
>>> df.iloc[[1, 2], [0, 3]]
First Four
b 0.910304 0.531814
c 0.478501 0.129311
loc method.
This is one of the most useful methods to filter data based on particular conditions. Every label must be the valid index, or a KeyError
will be raised. When slicing, both start and end index values are inclusive, if present in the index. Integers are valid labels, but they refer to the label and not the position.
The valid inputs for the loc methods are.
- single label(
a
) - list of labels
- slice object with labels (
a : c
) both a and c are inclusive. - boolean array
- condition
# DataFrame
# Slice.
# select particular rows and all columns
>>> df.loc['a':'c']
First Second Third Four
a 0.897843 0.429562 0.224036 0.594005
b 0.910304 0.204563 0.788041 0.531814
c 0.478501 0.311965 0.682055 0.129311
# select particular rows and particular columns
>>> df.loc['a':'c', ['First', 'Third']]
First Third
a 0.897843 0.224036
b 0.910304 0.788041
c 0.478501 0.682055
# select all rows and columns based on condition
>>> df.loc[ df.First > 0.7 ]
First Second Third Four
a 0.897843 0.429562 0.224036 0.594005
b 0.910304 0.204563 0.788041 0.531814
# select all rows and columns based on more than one condition.
>>> df.loc[(df.First > 0.7) & (df.Second >= 0.3)]
First Second Third Four
a 0.897843 0.429562 0.224036 0.594005
# modify particular column values based on condition.
>>> df.loc[df.Second > 0.3, ['First']] = 1
>>> df
First Second Third Four
a 1.000000 0.429562 0.224036 0.594005
b 0.910304 0.204563 0.788041 0.531814
c 1.000000 0.311965 0.682055 0.129311
d 0.632599 0.079228 0.882911 0.456732
loc and iloc are very useful methods of pandas in Data Analysis. I use these methods a lot while doing Exploratory Data Analysis. I'm sure you'll be used these methods in your Data Science or Machine Learning Journey. In the beginning, you may find these methods a little difficult to understand but once you start using them in your project you'll come to know that how easy and efficient these methods are.
This is all about today's article, if you want to know more about this topic you can read pandas documentation here.
Thank you for reading, I hope this helps you.
See you in my next article.:)