pandas is a Python package that provides a fast, flexible, and expressive data structure designed to make working with “relational” or “labeled” data easy and intuitive.
Limitation of NumPy
for performance, NumPy arrays were significantly faster but NumPy is missing features to enable data analysis on relational data(data that are related to one another). A few of the features missed in NumPy are :
- NO ways to attach labels to data.
- NO pre-built methods to fill missing values.
- NO ways to group data.
- NO ways to pivot data.
Why Pandas ?
pandas are+ built on top of NumPy to make data processing on relational data easier.
Ingesting(the process of obtaining and importing data), Storing, pre-processing, Summarising, And visualizing data can all be done effectively in pandas.
Size mutability: columns can be inserted and deleted from DataFrame.
pandas make it easy for reshaping, merging, and joining data sets.
import pandas as pd
creating a series (Series is 1D labeled homogeneously- typed array)
s=pd.Series([1,1,2,3,5,8,13],index=['a','b','c','d','e','f','g']) ##fibonacci series🤗 print(s)
## if index is not defined,index is autocreated as increasing whole number.(feature in pandas)
indexing and selection of data (preferred loc and iloc should be used)
s.iloc[0:4] ##iloc:It gives value for implicit location.
dtype: int64s.loc["a":"d"] ##loc : It gives value for the location "a" to "d" per index defined.
reading the file
>>This read the csv file "nifty" in dataframe
nifty.head() # display the top 5 data of the dataframe.
nifty.tail() # display the last 5 data of dataframe.
dealing with null (NaN) values
df.dropna() ##drops all null values rows
df.fillna(0) ##fill NaN values with zeros
df.fillna().mean() ##fills NaN values with mean
df.replace(np.nan, 0) ##replace with mean
df.replace(np.nan, df.column.mean()) ##replace with mean
operations in dataframe
df.mean() ## calculate the mean w.r.t. the columns
df.amin() ## display the least value of the column
df.std() ## display the standard deviation w.r.t. columns
df.median() ## display the standard deviation w.r.t. columnsAll the arithmetic operation are easily done in dataframe.
Thank you for reading!!😀😀