8.3 KiB
Document Version Control with GIT
Before we start…
What is version control?
-
Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later.
-
Has existed for almost as long as writing has existed (ex. document version)
-
Today, the most capable (as well as complex) revision control systems are those used in software development.
Why?
-
Revert files back to a previous state
-
"Freeze" important versions of a document
-
Compare changes over time
-
Track progress of a project
-
See who modified something, and when
Modern version control systems
-
Remote backup of files
-
Powerful tool for collaboration
GIT
-
Developed by Linus Torvalds in 2005
-
The linux Kernel:
-
~63000 files
-
Roughly 15,600 developers from more than 1,400 companies
-
Characteristics
-
Free and open source
-
Distributed
-
Powerful and flexible
-
Learning curve can be steep
!
How does it work?
Installation
Package managers are heavily recommended!
Creating a remote repository
-
register at the remote git server
-
create repository
-
add participants ssh public keys
-
clone the repository in your machine
README and .gitignore
Every repository should have these 2 files:
-
README: project description and useful information
-
.gitignore: special file indicating GIT which files are not to be tracked
workflow
copying remote repository: clone
-
git clone repository
-
Clones the remote repository into the local one
staging changes (local)
-
git add files
-
Adds the changes into the local staging area
Saving changes: commit (local)
-
git commit "message"
-
Saves the changes in the staging area into the repository
-
Creates a "snapshot" of the current state of one or more files
-
A message describing the changes must be provided
history and revert (local)
-
git log files
-
returns a history of the file modifications
-
git revert commit
-
removes one or more commits from the local files, changes must be committed after
upload to remote repository: push
-
git push
-
Uploads the state of the local repository to the remote one
Download from remote repository: pull
-
git pull
-
Fetch and merges the documents in the remote repository into the local one
-
Merging files can generate conflicts, git will ask us to fix them and commit the changes
Branching
other (advanced) stuff
-
tags
-
partial reverts
-
change history
-
…
Docs as code
-
Software is a small part of the documents a project must handle
-
Still, version control and remote collaboration are needed for all the documents
-
In the last years there is a big push of treating documents the same way as programming files
Advantages
-
Working in plain text files (rather than binary file formats like Word)
-
Collaborating using version control such as git and GitHub
-
Storing docs in the same repositories as the programming code itself
-
Versioning docs through git tags/releases (rather than duplicating all the files to archive each release)
-
Generate other formats or websites without modifying the document
Just a little problem…
-
The most common document formats: word, pdf… are binary files
-
git (text based) doesn’t work with them
Solutions?
-
Markup languages:
-
Markup languages are ways of annotating an electronic document.
-
Usually markup will either specify how something should be displayed or what something means.
-
html, xml, latex…
-
Markup languages
-
Documents are written in plain text, then a program convert them into the final document
-
The same document can be used to generate files in other formats: latex, word, pdf or even slides
-
Formating is done by the computer, output is always consistent
-
Fast and light
-
Can be used in version control systems
Markup languages: Advanced features
-
Automatic generation of documents
-
Inline comments (not rendered in the final document)
-
Split one the document into several. Ex: main document, chapters and bibliography
-
Code executed and plots rendered in the document
Latex
-
Extensively used for technical papers
-
Beautiful generated documents
-
Very powerful…
-
…and very heavy
-
Setup and document customization are complex
Latex: example
\documentclass{article} \usepackage{graphicx} \begin{document} \title{Introduction to \LaTeX{}} \author{Author's Name} \maketitle \begin{abstract} The abstract text goes here. \end{abstract} \section{Introduction} Here is the text of your introduction. \begin{equation} \label{simple_equation} \alpha = \sqrt{ \beta } \end{equation} \subsection{Subsection Heading Here} Write your subsection text here. \begin{figure} \centering \includegraphics[width=3.0in]{myfigure} \caption{Simulation Results} \label{simulationfigure} \end{figure} \section{Conclusion} Write your conclusion here. \end{document}
Latex: example II
\documentclass[12pt]{article} \usepackage{lingmacros} \usepackage{tree-dvips} \begin{document} \section*{Notes for My Paper} Don't forget to include examples of topicalization. They look like this: {\small \enumsentence{Topicalization from sentential subject:\\ \shortex{7}{a John$_i$ [a & kltukl & [el & {\bf l-}oltoir & er & ngii$_i$ & a Mary]]} { & {\bf R-}clear & {\sc comp} & {\bf IR}.{\sc 3s}-love & P & him & } {John, (it's) clear that Mary loves (him).}} } \subsection*{How to handle topicalization} I'll just assume a tree structure like (\ex{1}). {\small \enumsentence{Structure of A$'$ Projections:\\ [2ex] \begin{tabular}[t]{cccc} & \node{i}{CP}\\ [2ex] \node{ii}{Spec} & &\node{iii}{C$'$}\\ [2ex] &\node{iv}{C} & & \node{v}{SAgrP} \end{tabular} \nodeconnect{i}{ii} \nodeconnect{i}{iii} \nodeconnect{iii}{iv} \nodeconnect{iii}{v} } } \subsection*{Mood} Mood changes when there is a topic, as well as when there is WH-movement. \emph{Irrealis} is the mood when there is a non-subject topic or WH-phrase in Comp. \emph{Realis} is the mood when there is a subject topic or WH-phrase. \end{document}
Latex alternative: Lyx
-
WYSIWYG latex editor
-
Documents are generated in .lyx, a subset of latex
-
Can be used together with version control
-
Provides, by default, templates for many of the biggest scientific journals
Lyx: example
Lyx: example II
Lightweight Markup languages
-
Also called Plain Text Markup or humane markup language
-
Provide a way of formating the document, while still being readable
-
Widely used on websites and code documentation
LML: current options
-
Markdown
-
reStructuredText (rst)
-
Asciidoc
Markdown
-
Created for minimal formating of web text
-
used everywhere: web, jupyter notebooks, r-markdown…
-
There is no standard, currently exist many flavours of it (github, commonmark, pandoc)
-
Originally not intended for documents, very limited
-
Different flavors and tools try to overcome this limitation
-
(+ pandoc)
-
Markdown: example
Asciidoc
-
Developed for book creation.
-
Limited number of users
-
Standardized and extensible, great documentation
-
Lack of resources makes that bugs or request take time to be fixed
reStructuredText
-
Originally intended for python documentation
-
medium sized but very tech-savvy community
-
Syntax is a little different than the other two
-
Very powerful and extensible
Which one to use?
-
Notetaking:
-
Markdown
-
Asciidoc
-
reStructuredText
-
-
Anything more serious:
-
reStructuredText
-
Latex/Lyx
-
Resources
choco install git vscode pandoc