To start off: by show of hands, get a sense for who has heard of GitHub, and who has a GitHub account. Who has used it on a project with other people? Note that they're not expected to. I want you all to leave here with a functioning portfolio. Objectives: - Create a GitHub account (if you don't have one) - Create a portfolio repository - Add a project to your portfolio - Learn basic Git commands
This pattern is based very roughly on a real project I worked on not too long ago, though the specifics have been changed.
Let's say I work for the office of transportation, infrastructure, and sustainability (OTIS) in Philadelphia. I'm working on a model for dynamic parking pricing. This is a model that will adjust the price of parking in the city based on demand. The goal is to reduce congestion and pollution in the city, while keeping parking available for when it's needed. I have examples of prior art from other cities, but I want to make sure that my model is as accurate as possible for the context of Philadelphia. This means I'm going to have to try out a few models, get feedback from my colleagues, and iterate on my work.
Let's say I'm working in R and I put together a first pass in a file named dynamic_pricing. This is entirely reasonable. [CLICK]
Now I run the model by some of my colleagues and they give me some other ideas that might improve the model, but might not. I want to test it so I create a copy of my file called dynamic_pricing_revised. This is the beginning of my trouble. [CLICK]
That same process happens again and I end up with a revision of my revision, dynamic_pricing_revised_2, since I don't want to lose my previous versions, just in case. [CLICK]
Finally, I'm ready to share my work with a wider presentation to stakeholders. [CLICK]
This is a pretty common pattern, and it's a mess. Someone coming to the code base 3 months in the future isn't going to know which files are necessary to keep around, and which are just clutter. And many times you actually see something like this: [CLICK]
We can do better, and in this workshop we're going to talk about how.
_This workshop is in line with the following MUSA program learning outcomes:_ - _Exercise professional skills and be well-equipped to enter the current workforce._ - _Be able to work collaboratively and cooperatively with peers and stakeholders._
git is a version control system, but what's that?
There's a lot of text on this slide, but you don't have to read it all. You _can_ access these slides online.
The first version of git was released in 2005. In 2009, most of the projects that I was working on were tracked in Subversion. One was tracked in Mercurial and one in CVS. By 2014, all of the projects that I was working on were tracked in Git.
[REGARDING THE "MOST POPULAR"] Is this true? I don't have a statistic to back it up, and there are many metrics that you could possibly use, but based on what I've seen in tech writ large, git is the most popular VCS. Even if it's a plurality, I'm fairly certain there's no VCS more popular than git today. [back to the slide]
[merging became magic...] In time, many of you will come to have complex feelings toward merging code -- among those feelings will be anxiety, anger, perhaps fear. However, you have to understand that it used to be so much worse.
GitHub was launched in 2008, 3 years after the first release of git. It wasn't the only site where you could share your code and it's still not.
Here we have a screenshot demonstrating how Github is more than just a place to store code. This is a small portion of what's called the "commit history" of a Python package called GeoPandas, which is widely used for geospatial data analysis. This screenshot is showing us specific changes from the vast number of modifications that have been made to this project over time. What's more, like many of the software tools you'll be using in various classes, GeoPandas is an open-source project, which means that its development is driven by a community of contributors. For example, you can see one of my commits here. I don't work for any company that maintains and owns GeoPandas; such a company doesn't exist. So, Git as a version control system is a powerful tool for facilitating this kind of decentralized collaboration, and GitHub enhances Git by providing a user-friendly interface and additional features that support collaboration and project management.
Generally, to use git and GitHub you want to have a GitHub account _and_ a program on your computer that connects to GitHub (called a git client). You can use GitHub without a local git client, just by making changes directly on the GitHub website, and for some changes this is a reasonable thing to do. Very often it is not, but we'll get to that later. [back to the slides]
I recommend everyone follow along on their own GitHub accounts. If you get lost, these slides are available, but we'll also have another opportunity to take a break or catch up later.
I created a repository that you can use as a template for your portfolio. I actually created this repository based on someone else's work. I recommand taking a read through the information contained in the README file. When you click "Use this template, this is going to take the folders and files in this **repository** and copy them into a new repository that you own. This may be the first time that I introduced the word "repository". It's a very central concept in GitHub, git, and version control in general.
- **Clone**: You can also copy a repository from your account or someone elses account to your computer; this process is called "cloning". - **Push** / **Pull**: Git helps you keep various copies of repositories synchronized by "pushing" your changes into other copies or "pulling" other people's changes into yours.
Technically, at this point anyone could access the content in this portfolio website (assuming you made the repository "Public") by just browsing to the appropriate folders and files. Each one of the individual project pages in this site is represented by a Markdown file; we'll talk about Markdown a little later, but I also know that you'll cover it in more deptch at some point in 508. Browsing files like this certainly isn't the most engaging way to see the information, but GitHub has a feature called GitHub Pages that will allow us to automatically build these Markdown files into a website.
To get to the pages settings, first open the repository settings with the Settings button which should be in the top-right of your repo links.
Then, over on the left side you should see an option labeled "Pages".
The only setting we'll have to configure is to select the "branch" that GitHub Pages will use. In a git repository you can have several "branches", each of which has a slightly different version of your website. You can them "merge" one branch into another. This can be useful in this case, for example, if you wanted to upload a draft of a post to your repository, but don't want to publish it on your live site yet. Right now, there should only be one branch in your repo -- called "main". Select that branch and click the "Save" button.
I'll give a quick overview of branches while we wait for your github pages site to deploy. Branches are a way that you can work on one version of your project without worrying about breaking another version. A clear example of this in public facing websites like portfolios is if you want to have a draft version of your repository in addition to the one that is live for everyone to see.
Let's say your portfolio is live on the main branch of your repository. Initially your commit history may look like this, where you've started the repository with the necessary files, and then later on you added in a project. Now let's say you want to add information about a second project, but you're not quite ready to publish it yet.
You might choose to create a new branch (which I've called "draft" here), and you can add the post about your second project on that branch. Now what you have is essentially two different versions of your code. If I want or need, I can actually modify each of these versions independently.
For example, let's say I need to correct the title of my first project -- maybe I misspelled it the first time, or maybe I decided to rename it. Regardless, I can update the name of that project without needing to publish my draft information about project 2.
When I'm ready I can complete my post about project 2...
And then to get the post onto my GitHub Pages hosted site, I can merge the changes back into my main branch. So branches are these parallel work streams that you can maintain separately and then have git fold back together when you're ready.
When your site is done being built you'll be able to see the URL for the site up at the top as in this image. For mine I have a custom URL set. You can too, but we're not going to into how to set that up today. There will be a link where you can read more about that in the **Additional Resources** section. The reason that the site isn't _immediately_ available or updated is because GitHub has to do some work to prepare the site for you. You can actually see the work that it does in your repository's **Actions**.
This is the page that you come to if you click on the "Project 1" link from the home page.
Note that there's nothing special about that file name. In fact, when you're adding actual projects, I'd encourage you to use sensible, relevant file names. The main guidance I'd give about file names for markdown files is that you should stick to lowercase letters, numbers, dashes, and underscores -- this is mainly by convention.
Let's take a tour
We'll come back to YAML, as there are a number of other settings for our site that are also written in YAML.
There are a couple links to syntax guides for Markdown in the "Additional Resources" section.
For the sake of organization, I recommend placing all images relevant to a particular project page under an assets/img/projects/proj-name/ folder.
A good commit message doesn't have to be long, but it should describe the change that you made, and ideally why you made it. These messages will create a comprehensive log for you and others in the future.
Values can have different types. They can be strings, numbers, booleans, lists, objects, or null. You can find more about YAML in the Additional Rsources section. For now, you don't have to know the mechanics of how these values actually get used in the page, you'll just have to know where to set them.
The first thing you'll do with your code is clone it. This is essentially downloading your code, but in the folder where your code is, git will also store information about where the code came from. So it's not just the code that gets downloaded to your computer, but also metadata about changes that have been made to that code over time and where the code can be pushed to when new changes are made.
Once you've cloned your code, the three most common things that you'll end up doing through git are: [READ THE SLIDE]
Each git client has a slightly different way of doing each of these, and they're all fine, but I encourage you to get very comfortable with the way that your git client has you add changes, commit those changes, and push them up to GitHub.
conflicts happen when git doesn't know what action should be taken.