??? 09/08/08 22:14 Read: times |
#158096 - More on use of source code managers Responding to: ???'s previous message |
I did use the every-day zip archives once. But I stopped doing that maybe 12-15 years ago.
Every new version produced one zip with the distributable(s) and one zip with the source. In the end, I normally never used the source zip file. Too much work to bother with it. Only a very major failure somewhere in the software would motivate the decompress of earlier zip files for comparison. The most "favourite" versions, (major version steps and the last five or ten minor steps) I kept on the disk for a bit quicker access. But the problem with having the source in a zip file is that I don't know the age of a source line. If the line is in version 1.2.5, I still don't know if it is in 1.2.4 or in 1.1.3 or where it came in. ----- The companies I have worked for have normally only used either free source code managers, or M$ source-code managers. The M$ ones cost money and you don't have any second-source to hope for in case you find a problem. Except for possible cost of buying a commercial source code manager, there is normally only one disadvantage: You have to learn how to use it. If you select the wrong tool, you may get a couple of extra disadvantages: - hard to know if the repository is broken or not. but you can to scan your zip files to verify if they are ok or not. - hard to rename files or directories without duplicating them in the respository. But a zip file does not have any method to match an old file version/directory with one name, with the same file/directory after a rename, so even a bad source code manager will not loose to individual zip files. - no way to store file access flags, ownership etc. Better archive tools can do this, so the source code manager may loose. But a possible workaround is to create a script file to restore attributes/ownership/links etc. - a zip file is compressed so it takes little space. But every new release is a complete set of files. A repository contains delta-coded files so even a repository without compression normally manages to keep down the total disk space needed and amount of data to backup. It is only when binary full-copy files are stored that you may be unhappy if you need to store many big uncompressed versions in full in the repository. With todays disk sizes, the storage size needed is seldom a problem. It may be a problem for the backup system. - one zip file/version means constant time to extract any version. A repository normally stores a large number of differences, so it may be fast to restore the current version, but for every previous version yet one more patch step is needed. You seldom need to step back many versions, but todays machines are fast, so stepping back 100 versions is seldom a problem. And better source code repositories allows you to forget old unimportant file versions. If you are currently working on version 8.x, you can for example throw away all sub-versions between 1.0 and the last 1.x release and between 2.0 and the last 2.x release, since you are hardly interested in stepping back to minor bugg-fixes released ten years or more ago. - with zip files, you normally manually compress your working directory and copy the zip file to some place where it gets backed up. One problem here is to separate files to copy, and files to skip. With a source code repository, you normally specify lists of files to ignore (*.o, *.obj, *~, *.bak, IDE desktop history etc). So it is normally easier to extract the important input files from random noise you get in your working directory. - with zip files, you produce a new file every time. So you get one more file to write to a backup. Quite boring to perform the 100th full backup of an old zip file. Or you don't run backups but instead archive every new zip file - requiring you to write a script or manually make sure that the new zip file gets archived in a suitable number of copies. With a source-code repository, you normally run a full backup every night. More ata to take care of than a single new zip file. But compared to a full backup that copies all zip files, it will be less data. A new version that changes a couple of lines may represent a couple of kB of extra data. The new source lines, the diff to remove them to roll back to the previous version and the tags to stamp a current position in every source file. - zipping your code every night forms a linear progress. But what do you do when you want to experiment? Creating development branches with zip files is almost impossible. You will have a tough time giving the zip files unique names pointing at the date where you splitted the code, the date when you backed up the split, and what feature you where working with in the split. With a repository, most of this magic is hidden. Using a GUI you can even see the branching as trees. - branching can be thought stupid. If you plan your features correctly in relation to a release plan, then you may always have the time to implement new features in the normal linear sequence. But what if you release new hardware that is incompatible, or are releasing a new major version that the customer is require to pay for? If you need to release critical fixes to the old hardware platform or to the old major version, then you may be forced to split the code. Not much trouble if the fix is small. But if the fix is big, then you will be a bit sad that you will have to implement just about the same fix twice. Once in the old version and once in the version version of the code. Using zip files, you must make sure that you are great with diff3 and patch, to try and diff three versions of all source files to extract changes and then try to merge these changes from one version to another. Since a source repository knows about relations between different versions, it is way easier to extract a fix from split-point to current version and merge that change into the splitted code for the older hardware or with the older major version. There are more situations to discuss, but in reality, the only disadvantage with a source code manager is that you need to learn how to use it. In all other aspects it will match or win over the zip solution. There are few that are so expensive that you can't afford them. That would obviously represent a second disadvanage. But for a smaller operation, the free alternatives should be good enough. ----- I try as much as I can to write specifications in XHTML so that I can commit and tag the specifications together with the source code. Then I can extract the difference to code and specification with a single command. Just for reference, I normally make a full copy of each revision of the documents and place on a web server. But that is for the benefit of others. While developing, I normally only rely on the information I have in the repository. Many repositories can be run on either a WinNT++, GNU/Linux or BSD machine depending on what the user is comfortable with hosting. ----- Most work involves multiple develpers, and the repositories are stored on a central server that I administer. The other users do not have administrative rights to the machine. They use a ssh tunnel to access one or more repositories depending on what projects they are involved in. Some projects are Windows-based and some are GNU/Linux-based. Windows-based projects are normally using the CVSNT client for direct command-line access or optionally WinCVS or TortoiseCVS. I'm a command-line guy, so I can't really supply any comments about the GUI extensions. But it is important that at least the Windows-based developers have access to graphical front-ends if they want to use them. A source code manager that you are not comfortable with will not be used as much as it should. ----- At least for unix-based development, Git sounds very promising, but switching tools can be a bit expensive when there are no uncritical pilot projects available for a slow migration. ----- I can perform many commits during a day. I normally split the job into minor steps. Basically the smallest step that I am able to recompile and run. And then I commit all changed files together in a single commit. For example I may have changed a variable from 16-bit to 32-bit (at the same time changing its name to make sure that the compiler helps me to pick up all references to the variable so I know that I have reviewed if they are affected by the type change). That would form a single source code update/commit. Or I may fix a single bug and then commit it and in the commit comment and the history file specify the Bugzilla ID of the solved bug. But the big thing is that commits are not related to the next release of the software or the calendar day. With zip files, you normally run one zip file/day or one file/version. That is too big granularity to track changes. I may produce quite a lot of code lines or changes in a day, but if something breaks, I'm far more likely to want to look at the individual commits and see what commit was most likely to introduce the specific bug. Then I may have 1-50 lines of code to take a closer look at, instead of possibly thousands of lines of code that may have been changed between two releases. ----- Normally no tools in the repository, unless the tool is a minor tool written by myself or someone involved in the project. Tools such as a compiler are changed quite seldom but represents a huge amount of binary changes when replaced. Because of this, I normally keep binary archives (more or less your zip files) with that kind of tools. If possible, I also try to keep full installations of the tools in VMWare virtual machines, to allow me to switch back to an existing and working installation. With a virtual machine, the build machine may explode but I can still in a short time get another machine to build with a perfect replica of the previous build machine. ----- CAD files are normally handled as individual zip files. It is up to the engineer creating them to decide how he likes to handle his files locally. Each new release gets a release number and is stored in a zip file, together with diff information about relevant changes to BOM and functional changes relevant for a manual or for developing firmware for that specific release. ----- I normally avoid having a source-code repository locally. By making sure that it is stored on a different machine, a disk failure will never kill my working code and the repository at the same time. If my work disk breaks, I will lose uncommitted changes. If the repository disk breaks, I will loose comments and fine-granularity checkins since previous nightly backup, but my work disk will contain a full set of the latest code. Most source code managers works well with quite small/slow machines, so in a one-man operation or for hobby it can be enough to run a little Via Epia machine consuming maybe 20W of power. ----- Most code projects are in their own trees. In some situations, libraries and applications using the libraries are splitted. Sometimes by requiring the application to just link to a pre-built library. Sometimes by requiring the library to be checked out into a sub-directory under the application build tree. I haven't checked for Subversion, but at least for CVS it is also possible to define packages, where the CVS server creates a meta-project that will extract files from different parts of the repository and join into a common build tree. However, the common method is to manually check out the library into a sub-directory if I'm going to extend the library and want the application as a test-bench. If I'm just going to update an application, then the application gets linked to a pre-built library. ----- What I meant by private/shared machines whas that the configuration for machines I administer myself or together with other people are stored in a repository. If something breaks, the history will tell what changes has been made. If the machine needs to be reinstalled, the repository has a backup of the previous configuration. If an update of a package overwrites a configuration, the repository will tell the changes introduced by the application maintainer between the two application versions, and the changes introduced by me or other administrators. This allows the two sets of updates to be merged together, i.e. we get our extensions while at the same time any new features/security fixes from the upstream maintainer. ----- I normally haven't found it to be a problem when multiple users are working with the same code. Most changes are made to different files, and even when the same files are involved, the tools normally manges quite well. But it may take a bit getting used to. You should not work for a week at a time before committing. Then you have too much changed code lines. You get many collisions, and even without collissions you get a larger probability that the joined code will not compile ok on the next nightly build. And when something doesn't work, you get too many source code lines to scan through to figure out what change that broke the code. Did you test to view annotated code, so see the date/developer for each single source-code line? It is quite handy when trying to figure out what may have gone wrong. For CVS, it may look like: 1.1 (pwm 28-Apr-08): function load_image($image_name,$type) { 1.4 (pwm 28-May-08): if ($type == 'image/pjpeg' 1.4 (pwm 28-May-08): || $type == 'image/jpeg') { 1.1 (pwm 28-Apr-08): $in_img = imagecreatefromjpeg($image_name); 1.3 (pwm 29-Apr-08): } else if ($type == 'image/gif') { 1.3 (pwm 29-Apr-08): $in_img = imagecreatefromgif($image_name); 1.7 (pwm 28-May-08): } else if ($type == 'image/png') { 1.7 (pwm 28-May-08): $in_img = imagecreatefrompng($image_name); 1.1 (pwm 28-Apr-08): } else { 1.7 (pwm 28-May-08): $ext = get_extension($image_name); 1.7 (pwm 28-May-08): if ($ext == '.jpg' || $ext == '.jpeg') { 1.7 (pwm 28-May-08): $in_img = imagecreatefromjpeg($image_name); 1.7 (pwm 28-May-08): } else if ($ext == '.gif') { 1.7 (pwm 28-May-08): $in_img = imagecreatefromgif($image_name); 1.7 (pwm 28-May-08): } else if ($ext == '.png') { 1.7 (pwm 28-May-08): $in_img = imagecreatefrompng($image_name); 1.7 (pwm 28-May-08): } else { 1.7 (pwm 28-May-08): $in_img = false; 1.1 (pwm 28-Apr-08): } 1.1 (pwm 28-Apr-08): } 1.7 (pwm 28-May-08): if ($in_img) return $in_img; As you can see, by annotation I do not mean a programmers log, i.e. some form of history file. I mean a view of the source code file where the source code repository have augmented the code with version information. Your use of annotated code is more the running log. For example the result of cvs log: ---------------------------- revision 1.10 date: 2008-07-30 21:01:06 +0200; author: pwm; state: Exp; lines: +2 -2; Wrong name when destroying a photo. ---------------------------- revision 1.9 date: 2008-07-28 23:12:13 +0200; author: pwm; state: Exp; lines: +8 -1; Added Doxygen file header. ---------------------------- revision 1.8 date: 2008-05-28 22:02:23 +0200; author: pwm; state: Exp; lines: +16 -7; Switched to 32x32 icons. Edit of new flag if a photo is a guest photo. Hide edit/delete buttons for all other photos if info for one photo is being edited. ---------------------------- This can be handled by a separate log file, but the advantage of having it in the repository is that I can perform a select between two versions or between two dates, and extract the log information or the source code diff between these versions. A programmers log need not be kept up-to-date in relation to commits so it will not bind a history comment to the actual changed source lines. |