PANDAS

pandas is a Python package that provides a fast, flexible, and expressive data structure designed to make working with “relational” or “labeled” data easy and intuitive.

Limitation of NumPy

for performance, NumPy arrays were significantly faster but NumPy is missing features to enable data analysis on relational data(data that are related to one another). A few of the features missed in NumPy are :

  1. NO ways to attach labels to data.
  2. NO pre-built methods to fill missing values.
  3. NO ways to group data.
  4. NO ways to pivot data.

Why Pandas ?

pandas are+ built on top of NumPy to make data processing on relational data easier.

Ingesting(the process of obtaining and importing data), Storing, pre-processing, Summarising, And visualizing data can all be done effectively in pandas.

Size mutability: columns can be inserted and deleted from DataFrame.

pandas make it easy for reshaping, merging, and joining data sets.

import pandas as pd

creating a series (Series is 1D labeled homogeneously- typed array)

s=pd.Series([1,1,2,3,5,8,13],index=['a','b','c','d','e','f','g'])    ##fibonacci series🤗                                         print(s)
>>a 1
b 1
c 2
d 3
e 5
f 8
g 13

dtype: int64
## if index is not defined,index is autocreated as increasing whole number.(feature in pandas)

indexing and selection of data (preferred loc and iloc should be used)

s.iloc[0:4] ##iloc:It gives value for implicit location.
>>a 1
b 1
c 2
d 3
dtype: int64
s.loc["a":"d"] ##loc : It gives value for the location "a" to "d" per index defined.
>>a 1
b 1
c 2
d 3
dtype: int64

reading the file

nifty=pd.read_csv(“nifty.csv”,index_col=0)
>>This read the csv file "nifty" in dataframe
nifty.head() # display the top 5 data of the dataframe.
nifty.tail() # display the last 5 data of dataframe.

dealing with null (NaN) values

df.dropna()                           ##drops all null values rows
df.fillna(0) ##fill NaN values with zeros
df.fillna().mean() ##fills NaN values with mean
df.replace(np.nan, 0) ##replace with mean
df.replace(np.nan, df.column.mean()) ##replace with mean

operations in dataframe

df.mean()        ## calculate the mean w.r.t. the columns
df.amin() ## display the least value of the column
df.std() ## display the standard deviation w.r.t. columns
df.median() ## display the standard deviation w.r.t. columnsAll the arithmetic operation are easily done in dataframe.

Thank you for reading!!😀😀

--

--

--

The price of “anything” is the amount of “time”, U xchange for it. Education | Technology | Data Science | Statistics | History

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Unreal engine on Mac

How to convert website URL page to text TXT in PHP

Check if an IP Address is a Bot in C#

Least Recently and Frequently Used Cache Eviction Strategies in O(1)

Molecule Dataframe in Python

How long does it take to solve the FizzBuzz coding question?

How regular expression is used in getting insightful data?

Dagger 2 ( Caution: PLEASE TRY AT HOME ) Part 3

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Abu Qais

Abu Qais

The price of “anything” is the amount of “time”, U xchange for it. Education | Technology | Data Science | Statistics | History

More from Medium

Pandas DataFrame : Creating a Pandas DataFrame

Working with Missing Data in Pandas

Sorting data in Pandas