Happy Face

Enumerate "Data" Big Idea from College Board

Some of the big ideas and vocab that you observe, talk about it with a partner ...

  • "Data compression is the reduction of the number of bits needed to represent data"
  • "Data compression is used to save transmission time and storage space."
  • "lossy data can reduce data but the original data is not recovered"
  • "lossless data lets you restore and recover"

The Image Lab Project contains a plethora of College Board Unit 2 data concepts. Working with Images provides many opportunities for compression and analyzing size.

Image Files and Size

Here are some Images Files. Download these files, load them into images directory under _notebooks in your Blog.

Describe some of the meta data and considerations when managing Image files. Describe how these relate to Data Compression ...

  • File Type, PNG and JPG are two types used in this lab
  • Size, height and width, number of pixels
  • Visual perception, lossy compression

Displaying images in Python Jupyter notebook

Python Libraries and Concepts used for Jupyter and Files/Directories

IPython

Support visualization of data in Jupyter notebooks. Visualization is specific to View, for the web visualization needs to be converted to HTML.

pathlib

File paths are different on Windows versus Mac and Linux. This can cause problems in a project as you work and deploy on different Operating Systems (OS's), pathlib is a solution to this problem.

  • What are commands you use in terminal to access files? Some commands that allow you to access files from the terminal are commands like, ls, cd, grep, and cat. With ls being the command that allows you to see the files in the directory that you are currently in. Cd is the command that allows you to change directories. Grep is the command that allows you to search for a specific word in a file. Cat is the command that allows you to see the contents of a file.
  • What are the command you use in Windows terminal to access files? Some commands like dir, cd, find, and type are used to access files in the windows terminal. Dir is the command that allows you to see the files in the directory that you are currently in. Cd is the command that allows you to change directories. Find is the command that allows you to search for a specific word in a file. Type is the command that allows you to see the contents of a file.
  • What are some of the major differences? The major differences are the names of the commands but their functionality is the same.

Provide what you observed, struggled with, or leaned while playing with this code.

  • Why is path a big deal when working with images? Path is a big deal when working with images because it is the way that the computer knows where to find the image. If the path is wrong, the computer will not be able to find the image and will not be able to display it. Also the different operating systems have different ways of writing the path, so if you are working on a mac and you try to run the code on a windows computer, it will not work because the path is written differently.
  • How does the meta data source and label relate to Unit 5 topics? The meta data is the data that associated with the image which can be used to identify the image. The source is the location of the image and the label is the name of the image. This relates to unit 5 as this data may be used for greater functionality such as knowing things like the location where it was taken, who took it, etc.
  • Look up IPython, describe why this is interesting in Jupyter Notebooks for both Pandas and Images? As it allows you to display images as the output of a cell. This is interesting as it allows you to view the changes being made to the image in real time.
from IPython.display import Image, display
from pathlib import Path  # https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f

# prepares a series of images
def image_data(path=Path("images/"), images=None):  # path of static images is defaulted
    if images is None:  # default image
        images = [
            {'source': "Peter Carolin", 'label': "Clouds Impression", 'file': "clouds-impression.png"},
            {'source': "Peter Carolin", 'label': "Lassen Volcano", 'file': "lassen-volcano.jpg"}
        ]
    for image in images:
        # File to open
        image['filename'] = path / image['file']  # file with path
    return images

def image_display(images):
    for image in images:  
        display(Image(filename=image['filename']))


# Run this as standalone tester to see sample data printed in Jupyter terminal
if __name__ == "__main__":
    # print parameter supplied image
    green_square = image_data(images=[{'source': "Internet", 'label': "Green Square", 'file': "green-square-16.png"}])
    image_display(green_square)
    
    # display default images from image_data()
    default_images = image_data()
    image_display(default_images)
    

Reading and Encoding Images (2 implementations follow)

PIL (Python Image Library)

Pillow or PIL provides the ability to work with images in Python. Geeks for Geeks shows some ideas on working with images.

base64

Image formats (JPG, PNG) are often called *Binary File formats, it is difficult to pass these over HTTP. Thus, base64 converts binary encoded data (8-bit, ASCII/Unicode) into a text encoded scheme (24 bits, 6-bit Base64 digits). Thus base64 is used to transport and embed binary images into textual assets such as HTML and CSS.- How is Base64 similar or different to Binary and Hexadecimal? Binary is a numbering system that uses only two digits, 0 and 1, to represent numbers. In computers, binary is used to represent data and instructions using electrical signals, where a "0" is represented by no electrical signal and a "1" is represented by an electrical signal. Hexadecimal is also a numbering system that uses 16 digits, 0 to 9 and A to F, to represent numbers. It is often used in computing as a shorthand for binary, as one hexadecimal digit can represent four binary digits. They are similar in the fact that they are both numbering systems. They are different in the fact that they use different digits to represent numbers.

  • Translate first 3 letters of your name to Base64. The first three letters of my name "tir" in hexadecimal is 746972

numpy

Numpy is described as "The fundamental package for scientific computing with Python". In the Image Lab, a Numpy array is created from the image data in order to simplify access and change to the RGB values of the pixels, converting pixels to grey scale.

io, BytesIO

Input and Output (I/O) is a fundamental of all Computer Programming. Input/output (I/O) buffering is a technique used to optimize I/O operations. In large quantities of data, how many frames of input the server currently has queued is the buffer. In this example, there is a very large picture that lags.

  • Where have you been a consumer of buffering? I have been a consumer of buffering on large streaming platforms such as the likes of Youtube, Spotify, Amazon Prime, etc. This is because these platforms have a lot of data that needs to be sent to the user and it takes time for the data to be sent.
  • From your consumer experience, what effects have you experienced from buffering? The video will play in chunks with a pause in between each chunk as the platform is getting additional data. But this also allows for poor connection at times to not effect playback of the content.
  • How do these effects apply to images? The effects of buffering apply to images in the same way that they apply to videos. The image will be displayed in chunks with portions of the image loading in before others until the entire images can be displayed.
text = "tir"
hex_text = text.encode("utf-8").hex()
print(hex_text)
746972

Data Structures, Imperative Programming Style, and working with Images

Introduction to creating meta data and manipulating images. Look at each procedure and explain the the purpose and results of this program. Add any insights or challenges as you explored this program.

  • Does this code seem like a series of steps are being performed? Yes the program appears as if there are some original steps such as resizing the image and then applying a series of steps then being able to shift the image from color to grey scale.
  • Describe Grey Scale algorithm in English or Pseudo code? This grayscale algorithm takes a color image represented as a numpy array where each pixel contains an RGB tuple (3 values per pixel) or an RGBA tuple (4 values per pixel), and converts it into a grayscale image. For each pixel, the algorithm takes the average of the three RGB values (or four values for RGBA) and sets each value in the tuple to that average value. This process results in a gray pixel value for each pixel, where the same gray pixel value is assigned to each of the RGB channels in the resulting grayscale image.Finally, the resulting grayscale image is converted to a PIL image and encoded as a base64 string, which can be displayed in a Jupyter Notebook or a web page.
  • Describe scale image? What is before and after on pixels in three images? The pixels between the original and the scaled image are the same with both images having a height scale of 1 and length of 340px. The scale image function keeps the height the same and then sets the length to 340px.
  • Is scale image a type of compression? If so, line it up with College Board terms described? No as no data is being compressed as there is no type of compression algorithm is being used. Such as changing between lossy and lossless image formats. The image is just being resized.
from IPython.display import HTML, display
from pathlib import Path  # https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f
from PIL import Image as pilImage # as pilImage is used to avoid conflicts
from io import BytesIO
import base64
import numpy as np

# prepares a series of images
def image_data(path=Path("images/"), images=None):  # path of static images is defaulted
    if images is None:  # default image
        images = [
            {'source': "Internet", 'label': "Green Square", 'file': "green-square-16.png"},
            {'source': "Peter Carolin", 'label': "Clouds Impression", 'file': "clouds-impression.png"},
            {'source': "Peter Carolin", 'label': "Lassen Volcano", 'file': "lassen-volcano.jpg"}
        ]
    for image in images:
        # File to open
        image['filename'] = path / image['file']  # file with path
    return images

# Large image scaled to baseWidth of 320
def scale_image(img):
    baseWidth = 320
    scalePercent = (baseWidth/float(img.size[0]))
    scaleHeight = int((float(img.size[1])*float(scalePercent)))
    scale = (baseWidth, scaleHeight)
    return img.resize(scale)

# PIL image converted to base64
def image_to_base64(img, format):
    with BytesIO() as buffer:
        img.save(buffer, format)
        return base64.b64encode(buffer.getvalue()).decode()

# Set Properties of Image, Scale, and convert to Base64
def image_management(image):  # path of static images is defaulted        
    # Image open return PIL image object
    img = pilImage.open(image['filename'])
    
    # Python Image Library operations
    image['format'] = img.format
    image['mode'] = img.mode
    image['size'] = img.size
    # Scale the Image
    img = scale_image(img)
    image['pil'] = img
    image['scaled_size'] = img.size
    # Scaled HTML
    image['html'] = '<img src="data:image/png;base64,%s">' % image_to_base64(image['pil'], image['format'])
    
# Create Grey Scale Base64 representation of Image
def image_management_add_html_grey(image):
    # Image open return PIL image object
    img = image['pil']
    format = image['format']
    
    img_data = img.getdata()  # Reference https://www.geeksforgeeks.org/python-pil-image-getdata/
    image['data'] = np.array(img_data) # PIL image to numpy array
    image['gray_data'] = [] # key/value for data converted to gray scale

    # 'data' is a list of RGB data, the list is traversed and hex and binary lists are calculated and formatted
    for pixel in image['data']:
        # create gray scale of image, ref: https://www.geeksforgeeks.org/convert-a-numpy-array-to-an-image/
        average = (pixel[0] + pixel[1] + pixel[2]) // 3  # average pixel values and use // for integer division
        if len(pixel) > 3:
            image['gray_data'].append((average, average, average, pixel[3])) # PNG format
        else:
            image['gray_data'].append((average, average, average))
        # end for loop for pixels
        
    img.putdata(image['gray_data'])
    image['html_grey'] = '<img src="data:image/png;base64,%s">' % image_to_base64(img, format)


# Jupyter Notebook Visualization of Images
if __name__ == "__main__":
    # Use numpy to concatenate two arrays
    images = image_data()
    
    # Display meta data, scaled view, and grey scale for each image
    for image in images:
        image_management(image)
        print("---- meta data -----")
        print(image['label'])
        print(image['source'])
        print(image['format'])
        print(image['mode'])
        print("Original size: ", image['size'])
        print("Scaled size: ", image['scaled_size'])
        
        print("-- original image --")
        display(HTML(image['html'])) 
        
        print("--- grey image ----")
        image_management_add_html_grey(image)
        display(HTML(image['html_grey'])) 
    print()
---- meta data -----
Green Square
Internet
PNG
RGBA
Original size:  (16, 16)
Scaled size:  (320, 320)
-- original image --
--- grey image ----
---- meta data -----
Clouds Impression
Peter Carolin
PNG
RGBA
Original size:  (320, 234)
Scaled size:  (320, 234)
-- original image --
--- grey image ----
---- meta data -----
Lassen Volcano
Peter Carolin
JPEG
RGB
Original size:  (2792, 2094)
Scaled size:  (320, 240)
-- original image --
--- grey image ----

Data Structures and OOP

Most data structures classes require Object Oriented Programming (OOP). Since this class is lined up with a College Course, OOP will be talked about often. Functionality in remainder of this Blog is the same as the prior implementation. Highlight some of the key difference you see between imperative and oop styles.

  • Read imperative and object-oriented programming on Wikipedia
  • Consider how data is organized in two examples, in relations to procedures
  • Look at Parameters in Imperative and Self in OOP

Additionally, review all the imports in these three demos. Create a definition of their purpose, specifically these ...

  • PIL
  • numpy
  • base64
from IPython.display import HTML, display
from pathlib import Path  # https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f
from PIL import Image as pilImage # as pilImage is used to avoid conflicts
from io import BytesIO
import base64
import numpy as np


class Image_Data:

    def __init__(self, source, label, file, path, baseWidth=320):
        self._source = source    # variables with self prefix become part of the object, 
        self._label = label
        self._file = file
        self._filename = path / file  # file with path
        self._baseWidth = baseWidth

        # Open image and scale to needs
        self._img = pilImage.open(self._filename)
        self._format = self._img.format
        self._mode = self._img.mode
        self._originalSize = self.img.size
        self.scale_image()
        self._html = self.image_to_html(self._img)
        self._html_grey = self.image_to_html_grey()


    @property
    def source(self):
        return self._source  
    
    @property
    def label(self):
        return self._label 
    
    @property
    def file(self):
        return self._file   
    
    @property
    def filename(self):
        return self._filename   
    
    @property
    def img(self):
        return self._img
             
    @property
    def format(self):
        return self._format
    
    @property
    def mode(self):
        return self._mode
    
    @property
    def originalSize(self):
        return self._originalSize
    
    @property
    def size(self):
        return self._img.size
    
    @property
    def html(self):
        return self._html
    
    @property
    def html_grey(self):
        return self._html_grey
        
    # Large image scaled to baseWidth of 320
    def scale_image(self):
        scalePercent = (self._baseWidth/float(self._img.size[0]))
        scaleHeight = int((float(self._img.size[1])*float(scalePercent)))
        scale = (self._baseWidth, scaleHeight)
        self._img = self._img.resize(scale)
    
    # PIL image converted to base64
    def image_to_html(self, img):
        with BytesIO() as buffer:
            img.save(buffer, self._format)
            return '<img src="data:image/png;base64,%s">' % base64.b64encode(buffer.getvalue()).decode()
            
    # Create Grey Scale Base64 representation of Image
    def image_to_html_grey(self):
        img_grey = self._img
        numpy = np.array(self._img.getdata()) # PIL image to numpy array
        
        grey_data = [] # key/value for data converted to gray scale
        # 'data' is a list of RGB data, the list is traversed and hex and binary lists are calculated and formatted
        for pixel in numpy:
            # create gray scale of image, ref: https://www.geeksforgeeks.org/convert-a-numpy-array-to-an-image/
            average = (pixel[0] + pixel[1] + pixel[2]) // 3  # average pixel values and use // for integer division
            if len(pixel) > 3:
                grey_data.append((average, average, average, pixel[3])) # PNG format
            else:
                grey_data.append((average, average, average))
            # end for loop for pixels
            
        img_grey.putdata(grey_data)
        return self.image_to_html(img_grey)

        
# prepares a series of images, provides expectation for required contents
def image_data(path=Path("images/"), images=None):  # path of static images is defaulted
    if images is None:  # default image
        images = [
            {'source': "Internet", 'label': "Green Square", 'file': "green-square-16.png"},
            {'source': "Peter Carolin", 'label': "Clouds Impression", 'file': "clouds-impression.png"},
            {'source': "Peter Carolin", 'label': "Lassen Volcano", 'file': "lassen-volcano.jpg"}
        ]
    return path, images

# turns data into objects
def image_objects():        
    id_Objects = []
    path, images = image_data()
    for image in images:
        id_Objects.append(Image_Data(source=image['source'], 
                                  label=image['label'],
                                  file=image['file'],
                                  path=path,
                                  ))
    return id_Objects

# Jupyter Notebook Visualization of Images
if __name__ == "__main__":
    for ido in image_objects(): # ido is an Imaged Data Object
        
        print("---- meta data -----")
        print(ido.label)
        print(ido.source)
        print(ido.file)
        print(ido.format)
        print(ido.mode)
        print("Original size: ", ido.originalSize)
        print("Scaled size: ", ido.size)
        
        print("-- scaled image --")
        display(HTML(ido.html))
        
        print("--- grey image ---")
        display(HTML(ido.html_grey))
        
    print()
---- meta data -----
Green Square
Internet
green-square-16.png
PNG
RGBA
Original size:  (16, 16)
Scaled size:  (320, 320)
-- scaled image --
--- grey image ---
---- meta data -----
Clouds Impression
Peter Carolin
clouds-impression.png
PNG
RGBA
Original size:  (320, 234)
Scaled size:  (320, 234)
-- scaled image --
--- grey image ---
---- meta data -----
Lassen Volcano
Peter Carolin
lassen-volcano.jpg
JPEG
RGB
Original size:  (2792, 2094)
Scaled size:  (320, 240)
-- scaled image --
--- grey image ---

Hacks

Early Seed award

  • Add this Blog to you own Blogging site.
  • In the Blog add a Happy Face image.
  • Have Happy Face Image open when Tech Talk starts, running on localhost. Don't tell anyone. Show to Teacher.

AP Prep

  • In the Blog add notes and observations on each code cell that request an answer.
  • In blog add College Board practice problems for 2.3
  • Choose 2 images, one that will more likely result in lossy data compression and one that is more likely to result in lossless data compression. Explain.

Project Addition

  • If your project has images in it, try to implement an image change that has a purpose. (Ex. An item that has been sold out could become gray scale)

Pick a programming paradigm and solve some of the following ...

  • Numpy, manipulating pixels. As opposed to Grey Scale treatment, pick a couple of other types like red scale, green scale, or blue scale. We want you to be manipulating pixels in the image.
  • Binary and Hexadecimal reports. Convert and produce pixels in binary and Hexadecimal and display.
  • Compression and Sizing of images. Look for insights into compression Lossy and Lossless. Look at PIL library and see if there are other things that can be done.
  • There are many effects you can do as well with PIL. Blur the image or write Meta Data on screen, aka Title, Author and Image size.

Hacks

Pixel Manipulation

import numpy as np
from PIL import Image as pilImage
import matplotlib.pyplot as plt

# Load image file into numpy array
image = pilImage.open('images/doggo.jpg')
image_array = np.array(image)

# Find all black pixels and set them to green
black_pixels = np.where(np.all(image_array == [0, 0, 0], axis=-1)) # Find all black pixels
image_array[black_pixels] = [0, 255, 0] # Set black pixels to green

# Save the modified image to a new file
modified_image = pilImage.fromarray(image_array)
modified_image.save('images/newdoggo.jpg')

# Create a figure with two subplots
fig, axs = plt.subplots(1, 2)

# Plot the original image in the first subplot
axs[0].imshow(image)
axs[0].set_title('Original Image')

# Plot the modified image in the second subplot
axs[1].imshow(image_array)
axs[1].set_title('Modified Image')

# Show the plot
plt.show()
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt

# Load image file into numpy array
doggo = Image.open('images/doggo.jpg')
image_array = np.array(doggo)

# Set the green and blue color channels to zero
image_array[:, :, 1] = 0 # Set the green channel to zero
image_array[:, :, 2] = 0 # Set the blue channel to zero

# Save the red-scale image to a new file
red_doggo = Image.fromarray(image_array)
red_doggo.save('images/red_doggo.jpg')

fig, axs = plt.subplots(1, 2)

# Plot the original image in the first subplot
axs[0].imshow(doggo)
axs[0].set_title('Orginal Image')

# Plot the modified image in the second subplot
axs[1].imshow(image_array)
axs[1].set_title('Redscale Image')

# Show the plot for the comparison
plt.show()

Ap Prep

  • Images that will result in lossy compression are file formats along the likes of jpg, and images that will not result in lossy compression are lossless file formats such as the likes png. Which would mean that images in the jpg file format will be smaller in size than the images in the png file format as data is lost during compression.

Data Compression Quiz

from PIL import Image
import os
from IPython.display import display
from pathlib import Path

lossy_size = os.path.getsize("images/image_compressed_lossy.jpg")
lossless_size = os.path.getsize("images/image_compressed_lossless.png")


  
# prepares a series of images
def image_data(path=Path("images/"), images=None):  # path of static images is defaulted
    if images is None:  # default image
        images = [
            {'source': "Unit 2 Vocab", 'label': "Lossy Compression", 'file': "image_compressed_lossy.jpg"},
            {'source': "Unit 2 Vocab", 'label': "Lossless Compression", 'file': "image_compressed_lossless.png"}
        ]
    for image in images:
        # File to open
        image['filename'] = path / image['file']  # file with path
    return images

def image_display(images):
    for image in images:
        # Print the label of the image
        print(f"{image['source']} - {image['label']}")
        # Load the image file and display it
        img = Image.open(image['filename'])
        # Display the image
        display(img)



# Run this as standalone tester to see sample data printed in Jupyter terminal
if __name__ == "__main__":
    
    # printing the size of the images
    print(f"Size of the lossy image: {lossy_size} bytes\n")
    print(f"Size of the lossless image: {lossless_size} bytes\n")
    
    # display default images from image_data()
    default_images = image_data()
    image_display(default_images)
Size of the lossy image: 106523 bytes

Size of the lossless image: 803895 bytes

Unit 2 Vocab - Compressed
Unit 2 Vocab - Uncompressed
import os 
from PIL import Image as pilImage
from matplotlib import pyplot as plt
from pathlib import Path

# Opens the image file same one as before
with pilImage.open("/home/tirth/vscode/APCSP-Blog/images/Profile.jpg") as image:
    # Save the image using lossy JPEG compression
    image.save("images/image_compressed_very_lossy.jpg", "JPEG", quality=1)
    
    # Save the image using lossless PNG compression
    image.save("images/image_compressed_lossless_2.png", "PNG")

# Prints the file sizes
print(f"Lossy image size: {lossy_size} bytes")
print(f"Lossless image size: {lossless_size} bytes")

def image_data(path=Path("images/"), images=None):  # path of static images is defaulted
    if images is None:  # default image
        images = [
            {'source': "Unit 2 Vocab", 'label': "Compressed", 'file': "image_compressed_very_lossy.jpg"},
            {'source': "Unit 2 Vocab", 'label': "Uncompressed", 'file': "image_compressed_lossless_2.png"}
        ]
    for image in images:
        # File to open
        image['filename'] = path / image['file']  # file with path
    return images

def image_display(images):
    for image in images:
        # Print the label of the image
        print(f"{image['source']} - {image['label']}")
        # Load the image file and display it
        img = Image.open(image['filename'])
        # Display the image
        display(img)

if __name__ == "__main__":
    
    # Get the file sizes of the compressed images
    lossy_size = os.path.getsize("images/image_compressed_very_lossy.jpg")
    lossless_size = os.path.getsize("images/image_compressed_lossless_2.png")
    
    # display default images from image_data()
    default_images = image_data()
    image_display(default_images)
Lossy image size: 29631 bytes
Lossless image size: 803895 bytes
Unit 2 Vocab - Compressed
Unit 2 Vocab - Uncompressed

Programming Paradigm

from PIL import Image, ImageFilter

# Load the image
image = Image.open("images/Happy.png")

# Apply the blur filter to the image three times for a stronger effect
blurred_image = image.filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR).filter(ImageFilter.BLUR)

# Display the original and blurred images side by side
new_image = Image.new('RGB', (image.width * 2, image.height))
new_image.paste(image, (0, 0))
new_image.paste(blurred_image, (image.width, 0))
new_image.show()