Tuesday, November 22, 2011

Changing how talos.zip gets deployed


For the longest time I have been looking to have time to work on this project to make everyone's life easier. 
Bug 673131 - when minor talos changes land, the a-team should be able to deploy with minimal releng time required

What about if we could limit talos changes as any other change that lands on the tree?




Currently what we do is to download a talos.zip that replaces the old one in one of our build machines. This means that as soon as any job starts a talos job it will grab the newest talos.zip.
To read about some of the problems that this causes you can go at the bottom of this post.

I will tell you what I want to change even though I don't yet know exactly how to do it.

INITIAL DESIGN
  • the talos job downloads a text file:
    • e.g. hg.mozilla.org/mozilla-central/raw-file/abcd1234567/path/to/talos/config/file.json
  • that file will contain the URL of the talos bundle
    • e.g. people.mozilla.com/~armenzg/talos/talos.zip
  • the talos bundle will be downloaded
INITIAL CONCERNS
  • how to prevent a talos.zip to contain malicious code and cause us harm?
    •  anyone with try commit level could tinker with a machine inside of the build network (even though we don't ship anything from such machine)
    • we should find a way to limit this
    • perhaps have this feature only available to a give project branch? A-team branch?
      • we could add a cgi script to upload a talos.zip
      • maybe we should redesign this to just indicate a "revision" and update to it for http://hg.mozilla.org/build/talos
  • it forces to match a build to a given talos.zip
    • this means that if you want to try another talos.zip you will have to push a different changeset to specify a different talos.zip to be used even though the build is exactly the same
I am also afraid that this project could easily start scope creeping knowing how many artifacts we download for talos (e.g. pageloader.xpi, plugins et all).

CURRENT SETUP PROBLEMS
I wanted to have this section in case there was somebody curious about the problems we face with the current setup:
  • no need for downtimes as changes are isolated to a changeset landing
  • a changeset can have some talos jobs test the old talos.zip and the new one
    • this can make some platforms to show a new regression on that changeset and some on the next one. This makes it hard to figure things out
  • a build that started before the talos.zip was deployed can be blamed for causing a regression even though the talos.zip was deployed *after* the talos.zip was deployed
  • a talos.zip change does not show up on the pushlog
    • this means that it can only be noticed if a note is sent to dev.tree-management or if the Maintenance page was updated correctly
    • this means that it can not be backed out by a developer


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

No comments:

Post a Comment