Document Version Control with GIT

Before we start…​

nocloud

What is version control?

  • Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later.

  • Has existed for almost as long as writing has existed (ex. document version)

  • Today, the most capable (as well as complex) revision control systems are those used in software development.

Why?

  • Revert files back to a previous state

  • "Freeze" important versions of a document

  • Compare changes over time

  • Track progress of a project

  • See who modified something, and when

Modern version control systems

  • Remote backup of files

  • Powerful tool for collaboration

GIT

  • Developed by Linus Torvalds in 2005

  • The linux Kernel:

    • ~63000 files

    • Roughly 15,600 developers from more than 1,400 companies

Characteristics

  • Free and open source

  • Distributed

  • Powerful and flexible

  • Learning curve can be steep

xkcd git

How does it work?

architecture

Installation

Package managers are heavily recommended!

Creating a remote repository

README and .gitignore

Every repository should have these 2 files:

  • README: project description and useful information

  • .gitignore: special file indicating GIT which files are not to be tracked

workflow

git workflow

copying remote repository: clone

  • git clone repository

  • Clones the remote repository into the local one

staging changes (local)

  • git add files

  • Adds the changes into the local staging area

Saving changes: commit (local)

  • git commit "message"

  • Saves the changes in the staging area into the repository

  • Creates a "snapshot" of the current state of one or more files

  • A message describing the changes must be provided

history and revert (local)

  • git log files

  • returns a history of the file modifications

  • git revert commit

  • removes one or more commits from the local files, changes must be committed after

upload to remote repository: push

  • git push

  • Uploads the state of the local repository to the remote one

Download from remote repository: pull

  • git pull

  • Fetch and merges the documents in the remote repository into the local one

  • Merging files can generate conflicts, git will ask us to fix them and commit the changes

Branching

version control flow

other (advanced) stuff

  • tags

  • partial reverts

  • change history

  • …​

Docs as code

  • Software is a small part of the documents a project must handle

  • Still, version control and remote collaboration are needed for all the documents

  • In the last years there is a big push of treating documents the same way as programming files

Advantages

  • Working in plain text files (rather than binary file formats like Word)

  • Collaborating using version control such as git and GitHub

  • Storing docs in the same repositories as the programming code itself

  • Versioning docs through git tags/releases (rather than duplicating all the files to archive each release)

  • Generate other formats or websites without modifying the document

Just a little problem…​

  • The most common document formats: word, pdf…​ are binary files

  • git (text based) doesn’t work with them

Solutions?

  • Markup languages:

  • Markup languages are ways of annotating an electronic document.

  • Usually markup will either specify how something should be displayed or what something means.

    • html, xml, latex…​

Markup languages

  • Documents are written in plain text, then a program convert them into the final document

  • The same document can be used to generate files in other formats: latex, word, pdf or even slides

  • Formating is done by the computer, output is always consistent

  • Fast and light

  • Can be used in version control systems

Markup languages: Advanced features

  • Automatic generation of documents

  • Inline comments (not rendered in the final document)

  • Split one the document into several. Ex: main document, chapters and bibliography

  • Code executed and plots rendered in the document

Latex

  • Extensively used for technical papers

  • Beautiful generated documents

  • Very powerful…​

  • …​and very heavy

  • Setup and document customization are complex

Latex: example

\documentclass{article}
\usepackage{graphicx}

\begin{document}

\title{Introduction to \LaTeX{}}
\author{Author's Name}

\maketitle

\begin{abstract}
The abstract text goes here.
\end{abstract}

\section{Introduction}
Here is the text of your introduction.

\begin{equation}
    \label{simple_equation}
    \alpha = \sqrt{ \beta }
\end{equation}

\subsection{Subsection Heading Here}
Write your subsection text here.

\begin{figure}
    \centering
    \includegraphics[width=3.0in]{myfigure}
    \caption{Simulation Results}
    \label{simulationfigure}
\end{figure}

\section{Conclusion}
Write your conclusion here.

\end{document}

Latex: example II

\documentclass[12pt]{article}
\usepackage{lingmacros}
\usepackage{tree-dvips}
\begin{document}

\section*{Notes for My Paper}

Don't forget to include examples of topicalization.
They look like this:

{\small
\enumsentence{Topicalization from sentential subject:\\
\shortex{7}{a John$_i$ [a & kltukl & [el &
  {\bf l-}oltoir & er & ngii$_i$ & a Mary]]}
{ & {\bf R-}clear & {\sc comp} &
  {\bf IR}.{\sc 3s}-love   & P & him & }
{John, (it's) clear that Mary loves (him).}}
}

\subsection*{How to handle topicalization}

I'll just assume a tree structure like (\ex{1}).

{\small
\enumsentence{Structure of A$'$ Projections:\\ [2ex]
\begin{tabular}[t]{cccc}
    & \node{i}{CP}\\ [2ex]
    \node{ii}{Spec} &   &\node{iii}{C$'$}\\ [2ex]
        &\node{iv}{C} & & \node{v}{SAgrP}
\end{tabular}
\nodeconnect{i}{ii}
\nodeconnect{i}{iii}
\nodeconnect{iii}{iv}
\nodeconnect{iii}{v}
}
}

\subsection*{Mood}

Mood changes when there is a topic, as well as when
there is WH-movement.  \emph{Irrealis} is the mood when
there is a non-subject topic or WH-phrase in Comp.
\emph{Realis} is the mood when there is a subject topic
or WH-phrase.

\end{document}

Latex alternative: Lyx

  • WYSIWYG latex editor

  • Documents are generated in .lyx, a subset of latex

  • Can be used together with version control

  • Provides, by default, templates for many of the biggest scientific journals

Lyx: example

nvbqz

Lyx: example II

lyxlilipond

Lightweight Markup languages

  • Also called Plain Text Markup or humane markup language

  • Provide a way of formating the document, while still being readable

  • Widely used on websites and code documentation

LML: current options

  • Markdown

  • reStructuredText (rst)

  • Asciidoc

Markdown

  • Created for minimal formating of web text

  • used everywhere: web, jupyter notebooks, r-markdown…​

  • There is no standard, currently exist many flavours of it (github, commonmark, pandoc)

  • Originally not intended for documents, very limited

  • Different flavors and tools try to overcome this limitation

    • (+ pandoc)

Markdown: example

quicktourexample small

Asciidoc

  • Developed for book creation.

  • Limited number of users

  • Standardized and extensible, great documentation

  • Lack of resources makes that bugs or request take time to be fixed

reStructuredText

  • Originally intended for python documentation

  • medium sized but very tech-savvy community

  • Syntax is a little different than the other two

  • Very powerful and extensible

Which one to use?

  • Notetaking:

    • Markdown

    • Asciidoc

    • reStructuredText

  • Anything more serious:

    • reStructuredText

    • Latex/Lyx

Resources

choco install git vscode pandoc

Questions?