MUSA GitHub Workshop

Using git and GitHub for your projects and portfolio

Motivating scenario --

Imagine this scenario...


I'm working on a model for dynamic parking pricing for OTIS in Philadelphia.

  • dynamic_pricing.r
  • dynamic_pricing_revised.r
  • dynamic_pricing_revised_2.r
  • dynamic_pricing_final.r
  • dynamic_pricing_final_2.r

Goals

Product: You will leave here with a portfolio site hosted on GitHub Pages that you'll be able to add entries to as you complete projects in the MUSA program.

Purpose: To introduce you to the concept of version control, and to give you a basic understanding of why and how to use version control — specifically a version control system called git, along with the website GitHub.

Agenda

  1. Intro — setting the stage (5 min)
  2. What is a VCS? What is git? What is GitHub? (10 min)
  3. Getting started with GitHub (10 min)
    • Signing up for an account
    • Creating your portfolio
  4. Customizing your portfolio (15 min)
    • Adding projects with Markdown
    • Modifying the configuration with YAML
    • Peeling back the curtain — how's it work?
  5. Using git
    • Installing a git client
    • Cloning, updating, committing, pulling, pushing

What is a version control system?

The purpose of a version control system (VCS -- also called a source code manager or SCM) is to make it easier to manage and organize changes in your code. It allows you to track changes to files, quickly compare different versions of files, and to revert to previous versions or sync new changes into those files.

Version control is a time-travel enabling super-power 🦸🏾‍♀️!

Direct benefits of version control

  • Tracking changes: A VCS monitors the changes you make to your code. Instead of saving new versions of your files like "script_final," "script_final_revised," and so on (which we've all done 😒), a VCS keeps a detailed record of each change, and allows you to add annotations to your changes like "added an introduction," "changed the main function's name," and so forth.
  • Collaboration: If you're working with colleagues on the same code or sharing your code with others, a VCS helps everyone stay on the same page. It ensures that everyone can work on the code at the same time without messing things up. It's akin to working with others in the same Google Doc, rather than emailing around different versions of a Word document.

... and a few indirect benefits of version control

  • Backup: It's like a safety net. If you accidentally delete something important or your code becomes a big mess, you can go back to a previous version.

  • Experimentation: Want to try a new idea without destroying your current work? A VCS lets you create a new "branch" to experiment without affecting the main code.

  • Documentation: It's a log of your coding journey. You can see why you made certain changes, which can be really helpful when you revisit your code later.

Why git, and why GitHub?

There are other VC systems -- svn, hg, even some niche ones (like Piper, which Google uses internally). Is git the best VCS? That's debatable, and a matter of opinion.

But it's definitely the most popular by far.

When git debuted it had several things going for it:

  • It was invented by Linus Torvalds, the inventor of Linux. There's not much stronger star power you could have for a tool that developers use.

  • Merging became magic ✨. git was -- and is -- really smart about merging code changes from different collaborators together.

    git handles most merge situations seamlessly. ... But so do other tools that were invented around the same time (like hg or bzr). So the questions remains: why git?

I think the thing that pushed git over the top was...

GitHub

  • Launched in 2008.
  • It created tools for the social aspects of collaboration (managing code change permissions, requesting code merges), ensuring strong network effects, while relying on git for the technical aspects.
  • It stayed ahead of its competition (like BitBucket) by adding features that developers wanted (like GitHub Pages, which we'll use today), and by more effective corporate strategy, gaining favor among both open source and startup communities.
  • Bought by Microsoft in 2018
Sample GitHub version history --

Sample GitHub version history

Getting started with GitHub

GitHub Home Page --

Sign up for a GitHub account

Take a few minutes to do this now ⏲️.

(Use your personal or Penn email address -- you can add more than one address later)

Setting up the portfolio website

Forking a repository --

Make a copy of the portfolio template repository

  1. Navigate to https://www.github.com/weitzman-musa/portfolio-template
  2. Click the "Use this template" button to create a new repository based on this template.

Vocabulary (a quick aside...)

  • Repository: A repository (or "repo") is a collection of files and folders that you've told git to keep track of. You may have many repositories in your GitHub account.
  • Fork: Other people can copy a repository from your account to their account; this process is called "forking".
Forked Portfolio Repository --

You've forked the repository!

If you refresh you should see the same files as before, but under your account.

Browsing Files --

You can browse the files...

...but we can do better.

GitHub Settings, General Tab --

Open the GitHub Pages settings

GitHub Settings, Pages Tab --

Open the GitHub Pages settings

GitHub Settings, Select Pages Branch --

Configure the GitHub Pages site

Select the deployment branch and click Save.

Branches 1 --
Branches 2 --
Branches 3 --
Branches 4 --
Branches 5 --
Branches 6 --
GitHub Settings, Pages Finished Building --

In a couple minutes, your site will be ready!

You can keep refreshing the settings page to see when it's done.

Your URL will be of the form: https://[USERNAME].github.io/[REPO_NAME]/

e.g. https://mjumbewu.github.io/portfolio/

Full-screen Portfolio - Initial --

A new portfolio is now available at your URL (but you haven't really made it yours yet).

Customizing your portfolio

Updating your projects

  • Project pages (and blog pages, if you choose to use them) are written in Markdown.
  • Markdown is a simple formatting language for text, used to style and structure content, often (but not exclusively) for the web.
Project 1 initial state - portfolio page --

Consider the following project page...

Project 1 initial state - source preview --

The content of the page is written in Markdown. We can find the code if we browse to the file proj-1.md in the repository and click on the "Edit this file" button (the pencil icon).

Note: Markdown file names generally contain only lowercase letters, numbers, dashes (-), and underscores (_), and end in .md.

Project 1 initial state - source --

This is what the Markdown code looks like.

Markdown basics

The first part of the file is called the front matter. It's written in YAML. The start and end of the front-matter secton is indicated by the --- characters.

---
layout: post
title: 'Project One'
thumbnail: /assets/img/projects/proj-1/thumbnail.jpg
---

The front-matter section describes metadata about the page -- things that may or may not show up directly on the page, but are important for the site to know about.

Markdown basics

Below the front-matter section is the content of the page, written in Markdown. Here's an example of some Markdown code:


### Markdown example

With Markdown you can do things like **bold** and
_italicize_ text, create [links](https://www.google.com),
and add images.

It looks like this when rendered:

Markdown example

With Markdown you can do things like bold and italicize text, create links, and add images.

Markdown basics

If you want to add an image, you can do so by adding an image tag like this:

![Image description](/assets/img/projects/proj-1/image.jpg)

You can upload additional images into the assets/img/ folder and reference them in your Markdown file.

Try it out!

Take a few minutes ⏲️ to try the following:

  1. Find the _pages/about.md file in your repository.
  2. Click the "Edit this file" button (the pencil icon).
  3. Make changes to the file (anything -- you can always change it later!).
  4. Click the "Commit changes" button at the bottom of the page.
  5. Write a good commit message.
  6. Commit directly to the main branch.
  7. Reload the About page on your portfilio site.

Site-wide settings

  • Other site content and settings is stored in the files _config.yml and data/settings.yml. These files are written in YAML.
  • YAML (YAML Ain't Markup Language) is a human-readable data serialization format. It's often used to configure settings or data structures in a more readable way than traditional programming languages.
Settings YAML --

Find the data/settings.yml file in your repository. This is what YAML looks like.

YAML Basics

YAML files are made up of key-value pairs. Each key is the name of a particular piece of data for the site, and the value is what will be used when the key is referenced.

For example:

# Site header logo
logo: '/assets/img/musa-logo.png'

# Site menu entries
menu:
- {name: 'Projects', url: ' '}
- {name: 'Blog',     url: 'blog'}
- {name: 'About',    url: 'about'}
- {name: 'Contact',  url: 'contact'}

Using git outside of GitHub

Installing a git client

There are many options for git clients. I recommend GitHub Desktop, which is available for Mac and Windows.

You can also use a client that's integrated into your code editor. For example, VS Code has a built-in git client.

Finally, you can use the command line. This is the most powerful option, but it's also the most difficult to learn. If you're interested in learning more about this, I recommend this tutorial.


[1] https://desktop.github.com/

[2] https://code.visualstudio.com/

GitHub Repository - Cloning Options --

Clone your repository

Where is your code?

  • Your git client will download a copy of your repository to your computer. This is called "cloning" the repository. By default, it will be saved in a folder called "GitHub" in your home directory.

Update, commit, push!

  1. Make changes to your local files
  2. Commit those changes to your repository
  3. Push those changes to GitHub

... but sometimes there may be conflicts!

  1. Modify the /data/settings.yml file on GitHub:
    intro_title: My Personal Mission 10-word Summary
    
    Commit that change.
  2. Modify the /data/settings.yml file on your computer:
    intro_title: The Personal Mission 10-word Summary
    
    Commit that change.
  3. Try to push your changes to GitHub. It should prompt you to fetch first. Fetch, and then pull. You should see a merge conflict.
Merge Conflicts in settings.yml --

Conflicts look something like this

Conflicts can be resolved!

  • The first line (<<<<<<< HEAD) indicates the start of the conflict, and everything after that until the divider (=======) is the version of the content from your computer.
  • Everything after the divider until the end (>>>>>>> 3e786d06...) is the latest commit content from GitHub (the long hexidecimal number is the commit hash).
  • You just have to figure out what the correct code is, and then delete the conflict markers (<<<<<, =====, and >>>>>).
  • Each git client provides you different ways of making those choices. In GitHub Desktop, you can click on the conflict markers to choose which version you want to keep.

Collaboration recommendations

  1. When working on the same code base, work on different components
    • Ideally those components are in different files
    • If they're in the same file, work on different parts of the file
    • Nothing beats effective real-life communication!
  2. When merge conflicts arise, rely on your tools
    • Merge conflicts will happen. Take a deep breath, follow the instructions in your git client, and resolve them carefully.
  3. Commit and sync (push/pull) your code frequently
    • This will help you avoid merge conflicts, or at least allow you to resolve them before they become too complicated

Additional Resources

Set a custom domain for your site

See https://docs.github.com/en/pages/configuring-a-custom-domain-for-your-github-pages-site

Learn more about Markdown syntax

How (and why) to write good commit messages

From FreeCodeCamp: https://www.freecodecamp.org/news/how-to-write-better-git-commit-messages/

Learn more about YAML

Decent YAML Basics from Tutorials Point: https://www.tutorialspoint.com/yaml/yaml_basics.htm

Sample GH Pages themes

Common actions:

  • Fork a repository
  • Clone a repository
  • Create a branch
  • Make and stage changes
  • Commit changes
  • Push changes
  • Submit a pull request
  • Merge a pull request
  • Checkout a branch

To start off: by show of hands, get a sense for who has heard of GitHub, and who has a GitHub account. Who has used it on a project with other people? Note that they're not expected to. I want you all to leave here with a functioning portfolio. Objectives: - Create a GitHub account (if you don't have one) - Create a portfolio repository - Add a project to your portfolio - Learn basic Git commands

This pattern is based very roughly on a real project I worked on not too long ago, though the specifics have been changed.

Let's say I work for the office of transportation, infrastructure, and sustainability (OTIS) in Philadelphia. I'm working on a model for dynamic parking pricing. This is a model that will adjust the price of parking in the city based on demand. The goal is to reduce congestion and pollution in the city, while keeping parking available for when it's needed. I have examples of prior art from other cities, but I want to make sure that my model is as accurate as possible for the context of Philadelphia. This means I'm going to have to try out a few models, get feedback from my colleagues, and iterate on my work.

Let's say I'm working in R and I put together a first pass in a file named dynamic_pricing. This is entirely reasonable. [CLICK]

Now I run the model by some of my colleagues and they give me some other ideas that might improve the model, but might not. I want to test it so I create a copy of my file called dynamic_pricing_revised. This is the beginning of my trouble. [CLICK]

That same process happens again and I end up with a revision of my revision, dynamic_pricing_revised_2, since I don't want to lose my previous versions, just in case. [CLICK]

Finally, I'm ready to share my work with a wider presentation to stakeholders. [CLICK]

This is a pretty common pattern, and it's a mess. Someone coming to the code base 3 months in the future isn't going to know which files are necessary to keep around, and which are just clutter. And many times you actually see something like this: [CLICK]

We can do better, and in this workshop we're going to talk about how.

_This workshop is in line with the following MUSA program learning outcomes:_ - _Exercise professional skills and be well-equipped to enter the current workforce._ - _Be able to work collaboratively and cooperatively with peers and stakeholders._

git is a version control system, but what's that?

There's a lot of text on this slide, but you don't have to read it all. You _can_ access these slides online.

The first version of git was released in 2005. In 2009, most of the projects that I was working on were tracked in Subversion. One was tracked in Mercurial and one in CVS. By 2014, all of the projects that I was working on were tracked in Git.

[REGARDING THE "MOST POPULAR"] Is this true? I don't have a statistic to back it up, and there are many metrics that you could possibly use, but based on what I've seen in tech writ large, git is the most popular VCS. Even if it's a plurality, I'm fairly certain there's no VCS more popular than git today. [back to the slide]

[merging became magic...] In time, many of you will come to have complex feelings toward merging code -- among those feelings will be anxiety, anger, perhaps fear. However, you have to understand that it used to be so much worse.

GitHub was launched in 2008, 3 years after the first release of git. It wasn't the only site where you could share your code and it's still not.

Here we have a screenshot demonstrating how Github is more than just a place to store code. This is a small portion of what's called the "commit history" of a Python package called GeoPandas, which is widely used for geospatial data analysis. This screenshot is showing us specific changes from the vast number of modifications that have been made to this project over time. What's more, like many of the software tools you'll be using in various classes, GeoPandas is an open-source project, which means that its development is driven by a community of contributors. For example, you can see one of my commits here. I don't work for any company that maintains and owns GeoPandas; such a company doesn't exist. So, Git as a version control system is a powerful tool for facilitating this kind of decentralized collaboration, and GitHub enhances Git by providing a user-friendly interface and additional features that support collaboration and project management.

Generally, to use git and GitHub you want to have a GitHub account _and_ a program on your computer that connects to GitHub (called a git client). You can use GitHub without a local git client, just by making changes directly on the GitHub website, and for some changes this is a reasonable thing to do. Very often it is not, but we'll get to that later. [back to the slides]

I recommend everyone follow along on their own GitHub accounts. If you get lost, these slides are available, but we'll also have another opportunity to take a break or catch up later.

I created a repository that you can use as a template for your portfolio. I actually created this repository based on someone else's work. I recommand taking a read through the information contained in the README file. When you click "Use this template, this is going to take the folders and files in this **repository** and copy them into a new repository that you own. This may be the first time that I introduced the word "repository". It's a very central concept in GitHub, git, and version control in general.

- **Clone**: You can also copy a repository from your account or someone elses account to your computer; this process is called "cloning". - **Push** / **Pull**: Git helps you keep various copies of repositories synchronized by "pushing" your changes into other copies or "pulling" other people's changes into yours.

Technically, at this point anyone could access the content in this portfolio website (assuming you made the repository "Public") by just browsing to the appropriate folders and files. Each one of the individual project pages in this site is represented by a Markdown file; we'll talk about Markdown a little later, but I also know that you'll cover it in more deptch at some point in 508. Browsing files like this certainly isn't the most engaging way to see the information, but GitHub has a feature called GitHub Pages that will allow us to automatically build these Markdown files into a website.

To get to the pages settings, first open the repository settings with the Settings button which should be in the top-right of your repo links.

Then, over on the left side you should see an option labeled "Pages".

The only setting we'll have to configure is to select the "branch" that GitHub Pages will use. In a git repository you can have several "branches", each of which has a slightly different version of your website. You can them "merge" one branch into another. This can be useful in this case, for example, if you wanted to upload a draft of a post to your repository, but don't want to publish it on your live site yet. Right now, there should only be one branch in your repo -- called "main". Select that branch and click the "Save" button.

I'll give a quick overview of branches while we wait for your github pages site to deploy. Branches are a way that you can work on one version of your project without worrying about breaking another version. A clear example of this in public facing websites like portfolios is if you want to have a draft version of your repository in addition to the one that is live for everyone to see.

Let's say your portfolio is live on the main branch of your repository. Initially your commit history may look like this, where you've started the repository with the necessary files, and then later on you added in a project. Now let's say you want to add information about a second project, but you're not quite ready to publish it yet.

You might choose to create a new branch (which I've called "draft" here), and you can add the post about your second project on that branch. Now what you have is essentially two different versions of your code. If I want or need, I can actually modify each of these versions independently.

For example, let's say I need to correct the title of my first project -- maybe I misspelled it the first time, or maybe I decided to rename it. Regardless, I can update the name of that project without needing to publish my draft information about project 2.

When I'm ready I can complete my post about project 2...

And then to get the post onto my GitHub Pages hosted site, I can merge the changes back into my main branch. So branches are these parallel work streams that you can maintain separately and then have git fold back together when you're ready.

When your site is done being built you'll be able to see the URL for the site up at the top as in this image. For mine I have a custom URL set. You can too, but we're not going to into how to set that up today. There will be a link where you can read more about that in the **Additional Resources** section. The reason that the site isn't _immediately_ available or updated is because GitHub has to do some work to prepare the site for you. You can actually see the work that it does in your repository's **Actions**.

This is the page that you come to if you click on the "Project 1" link from the home page.

Note that there's nothing special about that file name. In fact, when you're adding actual projects, I'd encourage you to use sensible, relevant file names. The main guidance I'd give about file names for markdown files is that you should stick to lowercase letters, numbers, dashes, and underscores -- this is mainly by convention.

Let's take a tour

We'll come back to YAML, as there are a number of other settings for our site that are also written in YAML.

There are a couple links to syntax guides for Markdown in the "Additional Resources" section.

For the sake of organization, I recommend placing all images relevant to a particular project page under an assets/img/projects/proj-name/ folder.

A good commit message doesn't have to be long, but it should describe the change that you made, and ideally why you made it. These messages will create a comprehensive log for you and others in the future.

Values can have different types. They can be strings, numbers, booleans, lists, objects, or null. You can find more about YAML in the Additional Rsources section. For now, you don't have to know the mechanics of how these values actually get used in the page, you'll just have to know where to set them.

The first thing you'll do with your code is clone it. This is essentially downloading your code, but in the folder where your code is, git will also store information about where the code came from. So it's not just the code that gets downloaded to your computer, but also metadata about changes that have been made to that code over time and where the code can be pushed to when new changes are made.

Once you've cloned your code, the three most common things that you'll end up doing through git are: [READ THE SLIDE]

Each git client has a slightly different way of doing each of these, and they're all fine, but I encourage you to get very comfortable with the way that your git client has you add changes, commit those changes, and push them up to GitHub.

conflicts happen when git doesn't know what action should be taken.