Proposal by Joel Kitching for Pharo/Squeak integration with git/mercurial

Proposed by Joel Kitching (profile, biography) Don't forget to submit this proposal to official Google Melange site too!


Motivation

Monticello is generally regarded as the standard SCM to use on Squeak and Pharo implementations of Smalltalk.  It has served its purpose well as a bare-bones, simple, distributed SCM.  However, Smalltalkers could benefit greatly from a full-featured SCM:

  • Alternate file storage: A software project rarely consists of just code in a given programming language.  Using an alternate SCM would allow Smalltalkers to store other artifacts such as documentation, scripts, diagrams, and so on in the same repository.
  • Atomic commits: Monticello makes it easy to make many changes to a project and then dump the current state to disk.  However, it lacks an easy way to "commit often" and on the granularity of lines instead of entire methods.
  • Efficient storage: Monticello traditionally stores a copy of the entire codebase on every commit.  Another SCM would allow for more efficient diff-based storage instead.
  • Blame: Given a particular "version" of a Monticello package, there is no way of telling who wrote what line or what method, and in which revision it was introduced.  Modern SCMs allow a user to tell exactly who committed a particular line of code and when.
  • Community visibility: The Smalltalk community could benefit greatly from using widely-recognized hosting services such as Bitbucket, github, and Google Code.  SqueakSource is perhaps not quite as usable and less well-known.  Encouraging use of other services would result in greater visibility of Smalltalk code and could stimulate more code sharing and Smalltalk awareness in general.
  • GUI features: SCM support could be tightly integrated into the Squeak or Pharo UI.  For example, we might provide a graphical representation of the branching in a particular project, or introduce an "infinite undo" feature pulling from the repository in the widely-used OmniBrowser.

The SqueakSVN project already exists but it integrates with SVN, a classic client/server SCM.  But the trend as of late is towards distributed SCMs such as git and Mercurial, which offer many benefits over their server-based counterparts.  Monticello is highly distributed and the new SCM must be distributed as well to successfully compete with Monticello.  So it is natural to look to an SCM such as git or Mercurial for this purpose.  Luckily, though, it is likely we will be able to use some portion of the SqueakSVN codebase as described below to decrease development costs.

Goal

The goal will be to produce a way of controlling and managing source code of a single project within a Squeak/Pharo image.

This will ideally be accomplished by including some existing code and functionality from SqueakSVN, Git for Squeak, and DeltaStreams.

Proposed process

  1. Examine git/Mercurial and choose one based on feature-set, platform support, simplicity and/or existence of API.
  2. Implement a basic Smalltalk interface to the chosen SCM.
    • It can be based on the SCM's command-line interface or on its API if it exists.
    • This interface can be improved upon and expanded as front-end features are developed that can expose them.
    • Inspect the Smalltalk Git for Squeak project to see if it would be a starting point if using git.
  3. Define and implement a file mapping format for the Smalltalk code structure (packages, classes, methods).
    • This is because Smalltalk does not have a "native" flat-file code storage format.
    • For this we may be able to borrow the code from SqueakSVN.  It has a simple mapping where directories are classes, and files are either metadata or methods associated with that class.
  4. Choose a way of defining the limits of a particular project--which category/class/method should be watched for changes and versioned.
    • Consider the Monticello ideology of versioning a package and a method category of the same name preceded by a star (*).  Look into PackageInfo for this.
    • Also consider the SqueakSVN method of including a "current project" dropdown inside of the editor.
  5. DeltaStreams is working towards replacing the ChangeSet functionality in Squeak.  It could provide a useful mechanism to detect the changes since the last commit to repository.  Also, DeltaStreams already has some form of dealing with merging and conflicts.
    • Look into making use of DeltaStreams and speak with the author (the co-mentor of this SoC project) to see how this could be useful.
    • Ideally DeltaStreams would do all the work detecting changes, pass them off in some format that the SCM interface can understand, where it would be converted to the file mapping structure above, and then committed.  The reverse would also apply for checkouts.
    • DeltaStreams in this respect can be seen as an advanced version of git's staging area.  The git staging area could possibly be kept synchronized with the current changes within the image on-the-fly, or upon the user clicking some UI feature.
  6. Design some very simple interface that could be expanded upon which would enable the user to perform basic SCM operations with the repository.
    • At the very least, this would include: committing, checking out, switching branches, performing diffs, merging conflicts.

Suggested timeline and milestones

Weeks 1-2

  • May 24: Begin coding.
  • Examine potential SCM system candidates: git and Mercurial.  Look into existing code (i.e. Git for Smalltalk), and see what ways would be possible to creating bindings to an SCM system's API.
  • Discuss and solidify plans and timeline with mentors.

Weeks 3-4

  • Implementing necessary features of interface to SCM.  Could leave optional features to be worked on later when being exposed in the GUI.
  • Start to define file mapping format for Smalltalk code structure, or discover how existing code could be used from SqueakSVN.

Weeks 5-8

  • Bulk development.
  • Finish file mapping format for Smalltalk code.  Test importing and exporting.
  • Define a "project" and possibly define an interface to choose the current SCM repository as in SqueakSVN.
  • Work with Göran on interfacing with DeltaStreams to detect changes made to image codebase, determining whether it is part of the "project," and translating to and from the format reported to the decided upon file mapping.
  • July 16: Midterm evaluation.

Weeks 9-11

  • Design some simple interfaces to make use of these features.  Choosing a repository location, committing current changeset, importing or checking out new changes, etc.
  • August 9: Pencils down.

Weeks 12

  • Bugfixes.
  • August 16: Firm pencils down.

Risks

It may be difficult to build an interface that talks to the SCM.  However, we can always use OSProcess to run the command-line version tool manually and parse the output.

End result

A user will be able to manage and develop a Smalltalk project in a git/Monticello repository.  Sharing with other users will be possible through the SCM's standard method of "pulling" and "pushing." It will be easy and attractive to store the repository on a public hosting website such as github.org.

Future directions

If the community is open to adopting an alternative SCM system built by this project, many different directions could be taken afterward:

  • Implement an import/export tool to convert to and from Monticello repositories.
  • Implement a simple user interface for the chosen SCM so that the programmer can work with it from within the Smalltalk environment, instead of using some external application such as gitk.
  • Other integration into the image could be investigated.  For example, integration with the widely used OmniBrowser could be possible.  Allow user to show older revisions of code, diffs, log comments, etc.
  • Instead of writing to the sources/changes file, consider using a local "system/global" git/Mercurial repository.  This would involve some form of DeltaStreams as described above, and completely replacing writes to the sources file with writes to this SCM repository.

References




Updated: 9.4.2010