Day 1 (4 hours) 12:15-4:30 pm with lunch and coffee breaks intro (20 minutes) bus factor introductions what's your git-annex skill level? your Haskell skill level? Haskell for Readers http://joeyh.name/talks/git-annex-developer-workshop/haskell-for-readers schedule overview building git-annex from source (10 minutes) git clone git://git-annex.branchable.com/ git-annex sudo apt-get install haskell-stack / http://haskellstack.org/ stack build (older OS: stack build --stack-yaml stack-lts-9.9.yaml) git-annex core concepts and types (60 minutes) Key Types/Key.hs (aka Types.Key) key/value storage not encryption key annex symlinks and links show examples UUID Types.UUID unique identifier for git repository / special remote does a normal git repository have an identifier? no any clone is much like any other git repositories may contain unique data but then no other repo knows about it why does git-annex need a repository identifier? each clone may contain a different set of contents of files needs to know where the contents of a file is located NoUUID ugly and hides problems thought exercise imagine getting rid of NoUUID constructor use Maybe UUID instead eliminate the Maybe where possible Remote Types.Remote git remote special remote important fields of the Remote data type uuid cost storeKey retrieveKeyFile removeKey checkPresent compare Remote with external special remote protocol http://git-annex.branchable.com/design/external_special_remote_protocol/ interlude: how we use git-annex (30-60 minutes) me I built it for my own personal use glimpse inside some of my git-annex repos and workflows how datalad uses git-annex over to yarik and michael how others here use git-annex git-annex core concepts continued (30 minutes) recap Key UUID Remote git-annex branch http://git-annex.branchable.com/internals/ union merging CRDTs & vector clocks Annex.VectorClock example location tracking Logs.Presence.Pure LogLine LogStatus logParser buildLog exercise design a new git-annex branch file format how does this interact with union merging? how are old values removed from the file? space efficiency repeated uuids, timestamps, etc git gc what core git-annex concepts *don't* have types? git-annex branch files git-annex object files would adding types for these improve the code? case study: adding a new, major feature to git-annex (60 minutes) git-annex import tree http://git-annex.branchable.com/design/importing_trees_from_special_remotes/ high level design dual of export tree after importing, exporting the same tree is a no-op after exporting, importing yields the same tree lists content of special remote downloads new content from special remote (necessary to generate keys?) builds a git tree of its contents potential for export tree conflict important sticking point in design mitigations race safety via good ContentIdentifier S3 versioning UI analogy to git fetch / git push import tree and git fetch both update remote tracking branch refs/remotes/$name/master git push and export tree also update remote tracking branch to reflect their changes to the remote import/export special remote becomes similar to a git working tree without .git but with files that may be modified there and later imported api design ImportActions data types ImportLocation ContentIdentifier storage git-annex branch Logs.ContentIdentifier mappings between ContentIdentifier and Key local sqlite database Database.ContentIdentifier ImportableContents RemoteTrackingBranch ImportTreeConfig ImportCommitConfig option parsing Parser ImportOptions added RemoteImportOptions optparse-applicative planning for tomorrow (10 minutes) start thinking about a feature you'd like to see in git-annex or a part of git-annex implementation you want to explore to discuss tomorrow morning Day 2 (4 hours) 9 am-1pm with coffee and lunch breaks git-annex implementation details (60 minutes) not as core as Key, UUID, Remote, but all over the code Git Repo Ref Branch Sha Tag exercise Ref, Branch, Sha, Tag are all aliases not type safe! no distinction at all between Commit, Tree, Object split into separate types for type safety perhaps Ref Tag, Ref Commit, Ref Tree, Ref Object and Sha Tag, Sha Commit, Sha Tree, Sha Object why is Git interface in git-annex at all? GitConfig exercise add a new git config value to it Annex monad "global" state gitRepo getGitConfig remoteList Types.Command discussion: designing new git-annex features (120 minutes) discuss participants' feature ideas and think about designs