pandas is a Python package that provides a fast, flexible, and expressive data structure designed to make working with “relational” or “labeled” data easy and intuitive.

Limitation of NumPy

for performance, NumPy arrays were significantly faster but NumPy is missing features to enable data analysis on relational data(data that are related to one another). A few of the features missed in NumPy are :

  1. NO ways to attach labels to data.

Why Pandas ?

pandas are+ built on top of NumPy to make data processing on relational data easier.

Ingesting(the process of obtaining and importing data), Storing, pre-processing, Summarising, And visualizing data can all be done effectively in pandas.

Size mutability: columns can be inserted and deleted from DataFrame.

pandas make it easy for reshaping, merging, and joining data sets.

import pandas as pd

creating a series (Series is 1D labeled homogeneously- typed array)

s=pd.Series([1,1,2,3,5,8,13],index=['a','b','c','d','e','f','g'])    ##fibonacci series🤗                                         print(s)
>>a 1
b 1
c 2
d 3
e 5
f 8
g 13

dtype: int64
## if index is not defined,index is autocreated as increasing whole number.(feature in pandas)

indexing and selection of data (preferred loc and iloc should be used)

s.iloc[0:4] ##iloc:It gives value for implicit location.
>>a 1
b 1
c 2
d 3
dtype: int64
s.loc["a":"d"] ##loc : It gives value for the location "a" to "d" per index defined.
>>a 1
b 1
c 2
d 3
dtype: int64

reading the file

>>This read the csv file "nifty" in dataframe
nifty.head() # display the top 5 data of the dataframe.
nifty.tail() # display the last 5 data of dataframe.

dealing with null (NaN) values

df.dropna()                           ##drops all null values rows
df.fillna(0) ##fill NaN values with zeros
df.fillna().mean() ##fills NaN values with mean
df.replace(np.nan, 0) ##replace with mean
df.replace(np.nan, df.column.mean()) ##replace with mean

operations in dataframe

df.mean()        ## calculate the mean w.r.t. the columns
df.amin() ## display the least value of the column
df.std() ## display the standard deviation w.r.t. columns
df.median() ## display the standard deviation w.r.t. columnsAll the arithmetic operation are easily done in dataframe.

Thank you for reading!!😀😀

The price of “anything” is the amount of “time”, U xchange for it. Education | Technology | Data Science | Statistics | History