Skip to main content

My attempt (2) at learning a coding language in my mid 30s

This is the second compilation of my python language learning. 

-Use the "\n" in a print command for next line printing 

eg1. print ("First line \nSecond line") give

First line
Second line

It does not work with \n though. I get First line \nSecond line when I changed the slash direction. 

-Use the \t to create a tab.
-\' and \" to show the ' or " without python thinking it is part of the open of a string and to use \\ to print a \.  

-Indentation. Python is really particular about indentation. A code within the IF ELSE loop have to be indented else it will run into errors.  The good thing is usually this is reflected at the error message and can be easily rectified.  

-Iterative loop: For and While

The for loop will run for a set "n" of times as long as the condition is satisfied but the while loop will run until the condition is satisfied.  

eg2. x = 0
for x in range(2,7):

This will print 2,3,4,5,6 and stop at it has run through all the numbers from 2 to 7. 

eg3. x=0
        x = x+1

This will print the values of x from 0, 1, 2, 3 and when x = 4, the condition is satisfied and exit the loop.

While loop is more is more dangerous as if you are not careful to iterate the value of x (missing the last line of x = x+1, the loop will run forever since x is 0 and always lesser than 4). Use break in the while loop code to break out of the while loop even if conditions are not met. 

-using break and continue in For loop

eg4. for x in range (10,20):
            if (x ==15): break

This will give 10,11,12,13,14. Once the x = 15, you exit the loop.

eg5. for x in range (10,20):
            if (x % 5 == 0): continue
            print (x)

This will print 11,12,13,14,16,17,18,19.  10 and 15 are not printed out as they are a multiple of 5.   

As I was learning python for data analyst role, I was introduced to Pandas and NumPy. These are packages (called libraries) for scientific computing to allow for computing of large data sets. 

There are some naming convention which is pd for Pandas and np for NumPy.  The packages must be imported first before we can use the codes inside Pandas and NumPy. 

eg 6.    import pandas as pd
            import numpy as np

-create the dataframe (usually named df). Dataframe are data arrange in rows and columns as a table for display on python.  There are 3 ways to create this table.

Method 1) create df from a list. List are create using square brackets []. A table will have a column heading and data so you need to create two list to form the table. 

eg7. salesunit= [[1, 23,'Sofa'], [6, 5,'Chair'], [8, 1,'Table']]         
colNames = ['Month', 'Number', 'Item']
df = pd.DataFrame(data = salesunit, columns=colNames)
df #this line is required to show the table.

Method 2) create df from NumPy arrays. An array is a grid of values. A list like colNames can be considered as a 1 dimension array. However salesunit is considered as a 2 dimension matrix and the code start with np.array([ followed by the 1D array inside. A 3D array will start with np.array([ followed by the 2D arrays inside [[123], [456]], [[123], [456]]

eg8. salesunit= np.array([[1, 23,'Sofa'], [6, 5,'Chair'], [8, 1,'Table']])   
colNames = ['Month', 'Number', 'Item']
df = pd.DataFrame(data = salesunit, columns=colNames)

Method 3) create df from a Python Dictionary. Python dictionary is a collection of item and is made with the key and values pairs (includes the index) using curly brackets {}.  

salesunit = {'Month': {0: 1 , 1: 6, 2: 8}, 'Number':{0:23, 1: 5, 2:1}, 'Item':{0:'Sofa', 1:'Chair', 2:'Table'}}
df = pd.DataFrame(data = salesunit, columns=colNames)

These are all tedious methods as you need to input the data line by line. 

Method 4) loading df from excel files in the csv or xlsx format. 

filename = 'data.csv'

df = pd.read_csv(filename)

filename = 'data.xlsx'

df = pd.read_excel(filename)

-Commands for reading the dataframe

df.head() # Select top N number of records (default = 5). We can change the top N number by adding the number inside the bracket eg df.head(10) will return the top 10 records. This gives us a quick glance at the record at hand.  

df.tail() # Select bottom N number of records (default = 5)

df.dtypes # Check the column data types using the dtypes attribute

df.shape # Use the shape attribute to get the number of rows and columns in your dataframe # The info method gives the column datatypes + number of non-null values

This is very useful as it give a quick glance of the total number and type of records under each header.  It allows us to detect null values for our processing if total number of records and records in each header do not match up.  

df[['Month']] # Select one column using double brackets. This returns only the 'Month' column and type(df[['Month']]) returns a dataframe object. 

df[['Number', 'Item']] # Select multiple columns using double brackets. This returns the two columns 'Number' and 'Item'.

df['Month'] # Select one column using single brackets that will only return the values with no header. This returns as a series object. 

It is a much faster experience watching videos about the python coding. Writing this compilation takes up more time since I will try the code again and write an explanation of the output.  Nevertheless this forces me to recap the codes and understand the output with each line.  I shall continue to explore Pandas and NumPy at the next post.