<?xml version="1.0" encoding="UTF-8"?> <rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
> <channel><title>Adrian Kosmaczewski &#187; Architecture</title> <atom:link href="http://kosmaczewski.net/category/architecture/feed/" rel="self" type="application/rss+xml" /><link>http://kosmaczewski.net</link> <description></description> <lastBuildDate>Mon, 06 Feb 2012 08:40:05 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>http://wordpress.org/?v=3.3.1</generator> <item><title>Code Organization in Xcode Projects</title><link>http://akosma.com/2009/07/28/code-organization-in-xcode-projects/</link> <comments>http://akosma.com/2009/07/28/code-organization-in-xcode-projects/#comments</comments> <pubDate>Tue, 28 Jul 2009 16:01:50 +0000</pubDate> <dc:creator>akosma software</dc:creator> <category><![CDATA[Architecture]]></category> <category><![CDATA[Code]]></category> <category><![CDATA[iPhone]]></category> <category><![CDATA[Opinion]]></category> <category><![CDATA[Objective-C]]></category> <category><![CDATA[Xcode]]></category> <guid
isPermaLink="false">http://kosmaczewski.net/?p=1710</guid> <description><![CDATA[Xcode does not impose any structure to your source code tree. This is both cool and useful to quickly throw a couple of lines for a prototype, but in my experience, this approach does not scale. More often than not, &#8230; <a
href="http://akosma.com/2009/07/28/code-organization-in-xcode-projects/">Continue reading <span
class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<p>Xcode does not impose any structure to your source code tree. This is both cool and useful to quickly throw a couple of lines for a prototype, but in my experience, this approach does not scale. More often than not, without any hygiene, your project can become a mess. Just using Xcode defaults, after a while your resources will sit beside your .xcodeproj file, all the project classes will be thrown together in the Classes folder, and if you have a relatively large project, this approach makes finding individual files painful.</p><p>Of course, Xcode provides &#8220;Groups&#8221; to organize your source code, but the idea is to be able to quickly identify the different kind of files that make up your Xcode project, either for Mac or for the iPhone, without having to open the Xcode project file. This means having both a folder structure, and an internal source code file structure. All of this will help you maintain your project in the future, which means cheaper costs, and less time spent looking for bugs.</p><p>All of this is also particularly useful when browsing projects via Google Code, Github or any other kind of file view of source code repositories. If your code is organized in a nice folder structure, it is easier to explore than if all the files sit in the same folder.</p><p>In this post I will enumerate some best practices that I use in all of my projects. <span
id="more-1710"></span> So let&#8217;s say that you start a new Xcode project. Here&#8217;s the Xcode window that is presented to you (seen in &#8220;Condensed&#8221; mode, which is the one I prefer):</p><p><img
src="http://kosmaczewski.net/wp-content/uploads/2009/07/1.png" alt="1" title="1" width="501" height="555" class="alignnone size-full wp-image-1671" /></p><p>This is the project layout as seen in the Finder:</p><p><img
src="http://kosmaczewski.net/wp-content/uploads/2009/07/0.png" alt="0" title="0" width="550" height="266" class="alignnone size-full wp-image-1677" /></p><p>As you can see (and as you might have experienced), in the default layout used by Xcode, all new source code files will be thrown into the &#8220;Classes&#8221; folder, while all the new Resources will be just stored in the project root. In the long term, this layout can be really painful to deal with. So let&#8217;s just start rearranging things a bit:</p><h3>Organize source files in folders and mirror them in Xcode</h3><p>When I start a new Xcode project, I usually do the following:</p><ol><li>I remove the &#8220;Classes&#8221; group (just deleting references, not moving the items to the trash, as shown in the image below) <img
src="http://kosmaczewski.net/wp-content/uploads/2009/07/2.png" alt="2" title="2" width="500" height="122" class="alignnone size-full wp-image-1680" /></li><li>Then I add the following subfolders to the &#8220;Classes&#8221; folder:<ul><li>AppDelegate</li><li>Controllers</li><li>Models</li><li>Helpers</li></ul></li><li>Finally,  drag the enhanced “Classes” folder from the Finder to the Xcode project window, asking to “recursively create groups for every subfolder”: <img
src="http://kosmaczewski.net/wp-content/uploads/2009/07/3.png" alt="3" title="3" width="414" height="388" class="alignnone size-full wp-image-1673" /></li></ol><h3>Create separate folders for different Resource elements</h3><p>The next step is to actually create a resource folder in Finder, and add a subfolder for every kind of resource that my project will use: sounds, images, SQLite databases, NIBs, etc. Then I do the following:</p><ol><li>Remove the Resource group from the Xcode project (just deleting references)</li><li>Then I drag the newly created &#8220;Resources&#8221; folder from the Finder to the Xcode project window, asking to &#8220;recursively create groups for every subfolder&#8221;, like we did for the &#8220;Classes&#8221; folder.</li></ol><p>Doing this has an interesting side effect: when you localize your application in other languages, each folder will contain a subfolder with the localized resources inside (for example, an &#8220;en.lproj&#8221; for English, &#8220;es.lproj&#8221; for Spanish, and so on).</p><h3>Organize your code consistently</h3><p>Each @implementation *.m file should always present methods in this order:</p><ol><li>init and dealloc</li><li>public methods</li><li>public @dynamic properties</li><li>delegate methods (for each supported protocol)</li><li>private methods</li></ol><h3>Use #pragma statements to separate the regions shown above</h3><p>Each logic group of methods should be separated from each other using the following lines (just type &#8220;#p&#8221; and hit the TAB key in Xcode!):
[source:c:firstline(79)]
// &#8230;
return cell;
}</p><h1>pragma mark -</h1><h1>pragma mark UIAlertViewDelegate methods</h1><ul><li>(void)alertView:(UIAlertView *)alertView
clickedButtonAtIndex:(NSInteger)buttonIndex
{
// &#8230;
[/source]
The advantage of this approach is that later, you can use those #pragma marks to generate an automatic layout in the symbols pop-up of Xcode: <img
src="http://kosmaczewski.net/wp-content/uploads/2009/07/6.png" alt="6" title="6" width="305" height="342" class="alignnone size-full wp-image-1687" /></li></ul><p>You can get this pop-up window clicking on this sector of your Xcode window: <img
src="http://kosmaczewski.net/wp-content/uploads/2009/07/7.png" alt="7" title="7" width="438" height="156" class="alignnone size-full wp-image-1688" /></p><h3>Only leave public methods in the header files</h3><p>This means putting private methods definitions in a (Private) category on top of the *.m file. This will remove all compiler warnings (about &#8220;this class might not respond to this selector&#8221;) and will cleanly separate what&#8217;s public from what&#8217;s not:
[source:c:firstline(9)]</p><h1>import &#8220;UntitledViewController.h&#8221;</h1><p>@interface UntitledViewController (Private)
- (id)returnPrivateObject;
- (void)changeInternalState:(NSString *)param;
@end</p><p>@implementation UntitledViewController</p><ul><li>(id)init
{
[/source]</li></ul><h3> Use consistent coding conventions</h3><p><a
href="http://wiki.akosma.com/Objective-C_Code_Standards">You can use my own</a>, if you wish.</p><h3>Treat warnings as errors</h3><p><a
href="/2009/07/16/objective-c-compiler-warnings/">I&#8217;ve said a lot about that in a previous post</a>, but here&#8217;s a quick reminder:</p><p><img
src="http://kosmaczewski.net/wp-content/uploads/2009/07/4.png" alt="4" title="4" width="500" height="572" class="alignnone size-full wp-image-1681" /></p><h3>Create &#8220;Distribution&#8221; configurations for the new project</h3><p>Duplicate the &#8220;Release&#8221; configuration and create two new ones: &#8220;Distribution Ad Hoc&#8221; and &#8220;Distribution App Store&#8221;. Each one will have to be configured with their corresponding provisioning profiles.</p><h3>Add an &#8220;Entiitlements.plist&#8221; file to the project</h3><p>Remember to uncheck (disable, turn off) the &#8220;get-task-allow&#8221; value (I still don&#8217;t understand why every Xcode project does not create this file automatically). Then add the &#8220;Entitlements.plist&#8221; value to the corresponding key in the &#8220;distribution&#8221; configurations created in the previous step.</p><h3>Use source control</h3><p>As soon as your project source tree is ready, commit it to your repository, whichever this is.</li></ul></p><h3>Conclusion</h3><p>This is how the final project might look like:</p><p><img
src="http://kosmaczewski.net/wp-content/uploads/2009/07/5.png" alt="5" title="5" width="501" height="593" class="alignnone size-full wp-image-1675" /></p><p>Of course,  you might as well enforce the above best practices using your own default project templates; depending on your requirements, this might be a useful thing to do. You must store those new Xcode templates in the following locations:</p><ul><li><strong>iPhone:</strong> /Developer/Platforms/iPhoneOS.platform/Developer/Library/Xcode/Project Templates</li><li><strong>Mac:</strong> /Developer/Library/Xcode/Project Templates</li></ul><p>Hope this helps! As usual, feel free to add your comments, best practices, rants and other reactions in the comments section below.</p> ]]></content:encoded> <wfw:commentRss>http://akosma.com/2009/07/28/code-organization-in-xcode-projects/feed/</wfw:commentRss> <slash:comments>3</slash:comments> </item> <item><title>Adding Manpower</title><link>http://kosmaczewski.net/adding-manpower/</link> <comments>http://kosmaczewski.net/adding-manpower/#comments</comments> <pubDate>Fri, 08 Aug 2008 07:30:42 +0000</pubDate> <dc:creator>Adrian</dc:creator> <category><![CDATA[Architecture]]></category> <category><![CDATA[Books]]></category> <category><![CDATA[Papers]]></category> <category><![CDATA[Project Management]]></category> <category><![CDATA[Software]]></category> <category><![CDATA[process]]></category> <category><![CDATA[productivity]]></category> <category><![CDATA[project]]></category> <guid
isPermaLink="false">http://kosmaczewski.net/?p=1247</guid> <description><![CDATA[Published in 1975, &#8220;The Mythical Man-Month&#8221; is considered an all-time classic in the software engineering field. The book author, Frederick P. Brooks Jr., used his experience as the project manager of the IBM System/360 and its software, the Operating System/360, &#8230; <a
href="http://kosmaczewski.net/adding-manpower/">Continue reading <span
class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<p>Published in 1975, &#8220;The Mythical Man-Month&#8221; is considered an all-time classic in the software engineering field. The book author, Frederick P. Brooks Jr., used his experience as the project manager of the IBM System/360 and its software, the Operating System/360, to explain a common set of problem patterns, applicable to other software projects as well.</p><p>One of the most famous citations in the book is the one regarding the consequences of adding human resources to a late project; this article will provide a couple of thoughts about this assertion, and highlight some contrariwise opinions. <span
id="more-1247"></span> <strong>The Mythical Man-Month</strong></p><p>The second chapter of Brooks&#8217; masterpiece is named exactly as the book, &#8220;The Mythical Man-Month&#8221;; the core argument of this chapter is that the most frequent factor of project failure is schedule and time estimation. Brooks states that this is due to the fact that</p><blockquote>Men and months are interchangeable commodities only when a task can be partitioned among many workers <strong>with no communication among them.</strong> This is true of reaping wheat or picking cotton; it is not even approximately true of systems programming.
When a task cannot be partitioned because of sequential constraints, the application of more effort has no effect on the schedule. The bearing of a child takes nine months, no matter how many women are assigned.</blockquote><p>(Brooks, pages 16 &amp; 17)</p><p>The final phrase of the above paragraph is often used as a graphical depiction of the nature and meaning of Brooks&#8217; law. It implies the strong need for communication and integration existing in software projects; being social processes, software requires a strong network of communication between team members, allowing them to coordinate the inherent set of interdependencies that every project has.</p><p>After an interesting analysis of common time overrun situations, Brooks ends this chapter with the following conclusion, which contains the enunciation of the law itself:</p><blockquote>Oversimplifying otrageously, we state Brooks&#8217;s Law: <strong>Adding manpower to a late software project makes it later.</strong> This is then the demythologizing of the man-month. The number of months of a project depend upon its sequential constraints. The maximum number of men depends upon the number of independent subtasks. From these two quantities one can derive schedules using fewer men and more months. (The only risk is product obsolescence.) One cannot, however, get workable schedules using more men and fewer months. More software projects have gone awry for lack of calendar time than for all other causes combined.</blockquote><p>(Brooks, pages 25 &amp; 26)</p><p>This &#8220;law&#8221; is known and cited throughout the industry as an example of a common pattern, observed once and again in different projects all over the world:</p><blockquote><strong>Fact 3: Adding people to a later project makes it later</strong>(&#8230;)
Intuition tells us that, if a project is behind schedule, staffing should be increased to play schedule catch-up. Intuition, this fact tells us, is wrong. The problem is, as people are added to a project, time must be spent on bringing them up to speed.(&#8230;)
Furthermore, the more people there are on a project, the more the complexity of its communication rises.</blockquote><p>(Glass, page 16)</p><p>As a personal experience, I must say that the lecture of this book opened my eyes more than many, many other books. It is a funny read, but also an enlightening one: many anecdotes told by Brooks strangely correspond to my own experience, and this one is no exception. I have seen projects gone unfortunately late because of the simple fact of adding more people; and in one particular case, the project was cancelled altogether. These projects had several factors in common, though:</p><ul><li><strong>Bad documentation, or lack thereof;</strong> the only way for newcomers to the project to know what was going on was interrupting the other developers, disrupting the current operations on the project; I think that a good set of documents, describing both the high-level architecture and the low-level APIs are needed for new developers to jump in and catch up. It&#8217;s maybe not enough, but a good leap forward anyway.</li><li><strong>Lack of architectural vision;</strong> projects that do not have an architect, providing vision and technical leadership to the team, are in my opinion exposed to problems when more developers join the project. The architect can act as a proxy person, guiding new developers while they familiarize themselves with the project, isolating other developers from this task.</li><li><strong>Bad project decomposition in components;</strong> if the system to be developed is sufficiently large, and the decomposition in components is not properly done, the overlap and extended communication paths among team members might affect the whole project negatively. A good decomposition breaks down the whole project in a set of smaller ones, with the corresponding set of interfaces, which brings the whole team to work separately on different subsystems. In these, the risk of getting later for adding manpower is reduced proportionally.</li><li><strong>Bad working conditions;</strong> I positively think that open spaces are a common disease in our industry. Teams working in open spaces suffer more of noise and visual distractions, and this is more evident when new team members join the project.</li></ul><p><strong>Criticism</strong></p><p>However famous, Brooks&#8217; law has had a good deal of criticism as well, regarding the specific characteristics of projects that might be affected in case that new people is assigned to them. The OS/360 project, which served as the basis for Brooks&#8217; work, might not be similar to other projects, and as such, the law would not necessarily apply to them:</p><blockquote>For Brooks’ Law to be true, the amount of training effort required from existing staff must be significant. The amount of effort lost to training must exceed the productivity contributed by new staff when they eventually become productive. (&#8230;)
&#8220;Late&#8221; chaotic projects are likely to be much later than the project manager thinks&#8211;project completion isn’t three weeks away, it’s six months away. Go ahead and add staff. You’ll have time for them to become productive. Your project will still be later than your plan, but that’s not a result of Brooks’ Law. It’s a result of underestimating the project in the first place.(&#8230;)
Controlled projects are less susceptible to Brooks’ Law than chaotic projects. Their better tracking allows them to know when they can safely add staff and when they can’t. Their better documentation and better designs make tasks more partitionable and training less labor intensive. They can add staff later in the project with less risk to the project.</blockquote><p>(McConnell, 1999)</p><p>Scott Berkun gives a more concrete analysis on why the law could be wrong:</p><blockquote><ul><li><strong>It depends who the manpower is.</strong> The law assumes that all added manpower is equal, which is not true.</li><li><strong>Some teams can absorb more change than others.</strong> Some teams are more resiliant to change.</li><li><strong>There are worse things than being later.</strong> (&#8230;) That can be ok if you also get higher quality</li><li><strong>There are different ways to add manpower.</strong> (&#8230;) The more experience everyone has with mid-stream personnel changes, the better.</li><li><strong>It depends on why the project was late to begin with.</strong> (&#8230;) no amount of programming staff modifications will resolve the psychiatric needs of team leaders or the dysfunctions of executives.</li><li><strong>Adding people can be combined with other management action.</strong> (&#8230;) if you’re removing your worst, and most disruptive, programmer and adding one of your best, it can be a reasonable choice.</li></ul></blockquote><p>(Berkun, 2006)</p><p>And what about open source projects? Many of these (Linux, Apache, MySQL) are potentially among the biggest software projects ever undertaken, and they don&#8217;t appear to suffer o fthe problems pictured by Brooks&#8217; law:</p><blockquote>But proponents of open source and free software development, including Linux developers, are not completely satisfied with the Law. Most famously (among geeks at any rate), Eric Raymond in his &#8220;The Cathedral and the Bazaar,&#8221; declared Brooks&#8217; Law obsolete, if not simply limited, saying &#8220;if Brooks&#8217; Law were the whole picture, Linux would be impossible.&#8221;
Although Raymond now says that he has somewhat modified his views or was misunderstood, some still would say he is given to oversimplifying and outrageousness himself. &#8220;I don&#8217;t consider Brooks&#8217; Law &#8216;obsolete&#8217; any more than Newtonian physics is obsolete; it&#8217;s just incomplete. Just as you get non-Newtonian effects at high energies and velocities, you get non-Brooksian effects when transaction costs go low enough. Under sufficiently extreme conditions, these secondary effects dominate the system &#8212; you get nuclear explosions, or Linux.&#8221;</blockquote><p>(Jones, 2000)</p><p><strong>Conclusion</strong></p><p>So far, the discussion seems to be open. There might be a scale factor for projects, which in turn might expose them to be affected by Brooks&#8217; law. I think that research is needed to arrive to a conclusion, even if it will be a statistical one.</p><p>Other important facts highlighted in the book are the &#8220;second system phenomenon&#8221;, the productivity advantage of using high-level languages, and the importance of building a prototype &#8211; &#8220;one to throw away&#8221;. I can only recommend this book to everyone interested in the field of software engineering (which I did in my own review of classic books in this blog:<a
href=" http://kosmaczewski.net/2005/11/20/my-bookshelf-part-iii/"> http://kosmaczewski.net/2005/11/20/my-bookshelf-part-iii/</a> )</p><p><strong>References</strong></p><p>Berkun, S.; &#8220;Exceptions to Brooks’ Law&#8221;, January 11th, 2006, [Internet] <a
href="http://www.scottberkun.com/blog/2006/exceptions-to-brooks-law/">http://www.scottberkun.com/blog/2006/exceptions-to-brooks-law/</a> (Accessed June 8th, 2007)</p><p>Brooks Jr., F. P.; &#8220;The Mythical Man-Month &#8211; Essays on Software Engineering, Anniversary Edition&#8221;, 1995, Addison Wesley, ISBN 0-201-83595-9</p><p>Glass, R. L.; &#8220;Facts and Fallacies of Software Engineering&#8221;, Addison-Wesley, 2003, ISBN 0321117425</p><p>Jones, P.; &#8220;Brooks&#8217; Law and open source: The more the merrier?&#8221;, IBM, May 1st, 2000, [Internet] <a
href="http://www.ibm.com/developerworks/linux/library/os-merrier.html">http://www.ibm.com/developerworks/linux/library/os-merrier.html</a> (Accessed June 8th, 2007)</p><p>McConnell, S.; &#8220;Brooks&#8217; Law Repealed?&#8221;, IEEE Software, November/December 1999 [Internet] <a
href="http://stevemcconnell.com/ieeesoftware/eic08.htm">http://stevemcconnell.com/ieeesoftware/eic08.htm</a> (Accessed June 8th, 2007)</p> ]]></content:encoded> <wfw:commentRss>http://kosmaczewski.net/adding-manpower/feed/</wfw:commentRss> <slash:comments>3</slash:comments> </item> <item><title>Challenges for Software Engineers</title><link>http://kosmaczewski.net/challenges-for-software-engineers/</link> <comments>http://kosmaczewski.net/challenges-for-software-engineers/#comments</comments> <pubDate>Sun, 03 Aug 2008 20:54:28 +0000</pubDate> <dc:creator>Adrian</dc:creator> <category><![CDATA[Architecture]]></category> <category><![CDATA[Opinion]]></category> <category><![CDATA[Papers]]></category> <category><![CDATA[Software]]></category> <category><![CDATA[Technology]]></category> <category><![CDATA[business]]></category> <category><![CDATA[conference]]></category> <category><![CDATA[Google]]></category> <category><![CDATA[hiring]]></category> <category><![CDATA[internet]]></category> <category><![CDATA[Microsoft]]></category> <category><![CDATA[NeXT]]></category> <category><![CDATA[Open Source]]></category> <guid
isPermaLink="false">http://kosmaczewski.net/?p=1245</guid> <description><![CDATA[Software Engineering is the youngest of all the professions, being born around 50 years ago, but since then it has been continually improved. Practicers have fiercely debated upon it through the years, given the extremely fast pace of the innovations &#8230; <a
href="http://kosmaczewski.net/challenges-for-software-engineers/">Continue reading <span
class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<p>Software Engineering is the youngest of all the professions, being born around 50 years ago, but since then it has been continually improved. Practicers have fiercely debated upon it through the years, given the extremely fast pace of the innovations in the field, and the extremely difficult and inherently dynamic nature of software. Many trends have appeared and vanished, and many others will come.</p><p>In this article I will provide a short overview of two kinds of challenges that I consider that software engineers will have to confront in the next 20 years: the human and the technical. <span
id="more-1245"></span> <strong>The Human Factor</strong></p><p>A quick look at the agenda of the 29th Int. Conference on Software Engineering (held in Minneapolis last year, from the 20th to the 26th May 2007) shows the key themes considered by the software engineering research community as the major challenges today:</p><ul><li>&#8220;Improving Software Practice through Education: Challenges and Future Trends&#8221;</li><li>&#8220;Research Collaborations between Industry and Academia&#8221;</li><li>&#8220;Model-driven Development of Complex Systems: A Research Roadmap&#8221;</li><li>&#8220;Source Code Analysis: A Road Map&#8221;</li><li>&#8220;Software Reliability Engineering: A Roadmap&#8221;</li><li>&#8220;Global Software Engineering: The Future of Socio-technical Coordination&#8221;</li><li>&#8220;Collaboration in Software Engineering: A Roadmap&#8221;</li><li>&#8220;Self-Managed Systems: An Architectural Challenge&#8221;</li><li>&#8220;Software Project Economics: A Road Map&#8221;</li></ul><p>(Source: ICSE 2007)</p><p>Mixed up with technical concerns, some presentations highlighted core problems that appears in the current state of software engineering: <strong>communication, collaboration and human issues.</strong></p><blockquote>The core substance of software deserves more eyes and more minds, thinking ways to describe not only the big picture (something that you can do with fancy diagrams) but also to give solutions to the problems that developers find daily while building systems up. Software is a process, but not any kind of process: a human one, maybe the most intangible of all processes; and as such, it is filled with all human brightnesses and failures.</blockquote><p>(Myself, in 2006)</p><p>I have the deep, strong conviction that software development cannot and must not be separated from the human-side problems of forming, keeping and training teams, enhancing the internal and external communications, improving and enhancing the individual creativity as well as the ways of reaching team consensus. As a powerful example, the seminal Peopleware book by DeMarco and Lister showed that many of the most successful software companies have been those that excelled in creating human-centric environments:</p><blockquote>In 1982, (Mitchell Kapor) founded Lotus Development Corporation, for which he is most noted. While there, he revolutionized corporate workplace culture by making diversity and inclusivity top priorities in his goal for creating an environment that attracted and retained employees. There were many &#8220;firsts&#8221; for Lotus, including being the first company to sponsor an AIDS Walk event in the mid-80&#8242;s and refusing to do business with South Africa due to Apartheid.</blockquote><p>(Sterling-Hoffman)</p><p>Thanks to a sharp hiring process, a series of innovations in their flagship spreadsheet product, and a progressive corporate culture, Lotus dominated the software landscape of the 80s. Today, Google follows very closely Lotus&#8217; steps (Google, 2007a), and their brilliant results in the last few years seem to confirm this trend. Google for example allows their employees to use 20% of their time in their own projects (Google, 2007b). This is resulting in an incredible amount of code, used internally and also released as open-source projects:</p><blockquote>Google is a fantastic company to work for. I could cite numerous reasons why. Take the concept of &#8220;20 percent time.&#8221; Google engineers are encouraged to spend 20 percent of their time pursuing projects they&#8217;re passionate about. I started one such exciting project some time back, and I&#8217;m pleased to announce that Google is releasing the fruits of this project as an open source contribution to the Macintosh community. That project is MacFUSE, a Mac OS X version of the popular FUSE (File System in User Space) mechanism, which was created for Linux and subsequently ported to FreeBSD.</blockquote><p>(Google Mac Blog, 2007)</p><p>The empowerment of both the individual <strong>and</strong> the team (the emphasis is important here) is key for a successful software project.</p><p><strong>Parallelization</strong></p><p>Herb Sutter has put it very clearly: technically speaking, since the beginning of the decade, there is no way for getting more processing power without jumping to multicore architectures:</p><blockquote>The key question is: When will it end? After all, Moore’s Law predicts exponential growth, and clearly exponential growth can’t continue forever before we reach hard physical limits; light isn’t getting any faster.(&#8230;)
If you’re a software developer, chances are that you have already been riding the “free lunch” wave of desktop computer performance.(&#8230;)
Right enough, in the past. But dead wrong for the foreseeable future.</blockquote><p>(Sutter, 2005)</p><p>The problem is that <strong>more cores do not necessarily mean more computing power</strong>, because the jump done by chip manufacturers has not (yet) been completely followed by the software community. Of course there is the concept of &#8220;threads&#8221;, and multi-threaded applications can benefit of performance boosts when running on multicore hardware platforms; however, a number of myths have to be debunked, as the common &#8220;2 x 3GHz = 6GHz&#8221; (as explained by Sutter here: <a
href="http://www.ddj.com/showArticle.jhtml?documentID=ddj0503a&#038;pgno=3">http://www.ddj.com/showArticle.jhtml?documentID=ddj0503a&amp;pgno=3</a>), and even more importantly, creating multithreaded applications is not easy. At all.</p><p>A couple of months ago, in the &#8220;Questions and Answers&#8221; of LinkedIn.com I answered an interesting question about parallelization; the following excerpt of my answer pretty much summarizes my opinions about the current state of multithreading, as well as some challenges that are raised for the future:</p><blockquote>The problem is simply that the &#8220;streamline&#8221; programming languages languages do not provide good ways to code multithreaded applications. (&#8230;) Not at all. The problem is real, since multithreading applications are extremely complicated to think of, let alone develop properly. A line of code in a high-level language could mean several hundred instructions in a processor; and depending on the sharing algorithm used at the CPU level, each one of these instructions might be executed separately, sharing resources with other processes. So what happens when? (&#8230;)
What I mean is that the fact that the JVM and the CLR support threads does not make good .NET or Java developers good multithreading developers by default. It&#8217;s a different mindset; who is accessing your resources? (&#8230;)
I think that as long as programming languages do not take multitasking and multithreading as base features (and not as mere library or API add-ons) we will continue struggling with single-threaded applications that collide with each other.</blockquote><p>(Myself, this time on LinkedIn Answers, 2007)</p><p>I think that the challenge of parallelization is not only an extremely tough one, requiring what Thomas Kuhn calls a &#8220;paradigm shift&#8221;, but also an extremely huge business opportunity; after all, while the top of the Chinese ideogram for &#8220;Crisis&#8221; means &#8220;Danger&#8221;, the bottom part means &#8220;Opportunity&#8221; (Mary R. Bast, 1999).</p><p><strong>Very Large Systems</strong></p><p>I also think that software systems will invariably get bigger and bigger. And given the historically high risk of failure of software projects, the dependency on software of the modern society, the pervasiveness of the Internet, the low prices of connectivity and the overall globalization, it is more important than ever to get ready for those challenges.</p><p>In July 2006, the well known Software Engineering Institute of the Carnegie Mellon University published an impressive report (freely downloadable) called &#8220;Ultra-Large-Scale (ULS) Systems: The Software Challenge of the Future&#8221;:</p><blockquote>The study brought together experts in software and other fields to answer a question posed by the U.S. Army Office of the Assistant Secretary of the U.S. Army (Acquisition, Logistics &#038; Technology): “Given the issues with today’s software engineering, how can we build the systems of the future that are likely to have billions of lines of code?” Increased code size brings with it increased scale in many dimensions, posing challenges that strain current software foundations. The report details a broad, multi-disciplinary research agenda for developing the ultra-large-scale systems of the future.</blockquote><p>(SEI, CMU, 2006)</p><p>The 150-page long report gives an extremely detailed vision of the challenges raised by complex systems, in the following areas:</p><ul><li>Design</li><li>Monitoring</li><li>Human interaction</li><li>Computational Engineering</li><li>Deployment</li><li>Legal issues</li></ul><p>The report provides interesting conclusions, highlighting the methodologies and techniques that will required to tackle these systems efficiently, among them the role of the W3C, the forthcoming trends of grid computing and parallelization, the Model-Driven Architecture (MDA) initiative of the OMG, and finally the development of larger Service-Oriented Architectures (SOA) platforms, such as .NET or J2EE (page 41 of the report).</p><p>The report also places a strong emphasis in the concept of socio-technical ecosystems and I think it&#8217;s worth a read by everyone interested in software engineering.</p><p><strong>Conclusion</strong></p><p>Given its youth, we have yet to see the most important developments in software engineering. However, it is extremely difficult to predict the future in this industry: Bill Gates himself published a book in 1995, &#8220;The Road Ahead&#8221;, where he only slightly talks about the World Wide Web:</p><blockquote>&#8220;The Road Ahead&#8221; appeared in December 1995, just as Gates was unveiling Microsoft&#8217;s master plan to &#8220;embrace and extend&#8221; the Internet. Yet the book&#8217;s first edition, with its clunky accompanying CD-ROM, mentioned the Web a mere seven times in nearly 300 pages. Though later editions tried to correct this gaffe, &#8220;The Road Ahead&#8221; remains a landmark of bad techno-punditry &#8212; and a time-capsule illustration of just how easily captains of industry can miss a tidal wave that&#8217;s about to engulf them.</blockquote><p>(Salon.com, 2000)</p><p>In any case, I think that there are important challenges in our industry: the need for better human management, the jump to multicore architectures and multiprocessing, and the ever-growing size of software projects. These three elements will without any doubt change the shape of the industry in the years to come, and raise new challenges in turn.</p><p><strong>References</strong></p><p>Adrian Kosmaczewski on LinkedIn Answers, &#8220;For the software architects out there, do you feel there is an impending paradigm shift in the software development model, towards &#8220;parallel computing&#8221; models?&#8221;, January 2007, [Internet] <a
href="http://www.linkedin.com/answers?viewQuestion=&#038;questionID=7804&#038;askerID=4194838">http://www.linkedin.com/answers?viewQuestion=&amp;questionID=7804&amp;askerID=4194838</a> (Accessed June 3rd, 2007)</p><p>Adrian Kosmaczewski on Kosmaczewski.net, &#8220;What will the Software Architecture discipline look like in 10 years’ time?&#8221;, March 16th, 2006 [Internet] <a
href="http://kosmaczewski.net/2006/03/16/software-architecture-future/">http://kosmaczewski.net/2006/03/16/software-architecture-future/</a> (Accessed June 3rd, 2007)</p><p>Bast, Mary R; &#8220;Crisis: Danger &amp; Opportunity&#8221;, 1999 [Internet], <a
href="http://www.breakoutofthebox.com/crisis.htm">http://www.breakoutofthebox.com/crisis.htm</a> (Accessed June 3rd, 2007)</p><p>DeMarco, Tom &amp; Lister, Timothy, &#8220;Peopleware &#8211; Productive Projects and Teams, 2nd Edition&#8221;, 1999, Dorset House Publishing, ISBN 0-932633-43-9</p><p>Google, &#8220;Top 10 Reasons to Work at Google&#8221;, 2007a [Internet] <a
href="http://www.google.com/jobs/reasons.html">http://www.google.com/jobs/reasons.html</a> (Accessed June 3rd, 2007)</p><p>Google, &#8220;What&#8217;s it like to work in Engineering, Operations, &amp; IT?&#8221;, 2007b, [Internet] <a
href="http://www.google.com/support/jobs/bin/static.py?page=about.html">http://www.google.com/support/jobs/bin/static.py?page=about.html</a> (Accessed June 3rd, 2007)</p><p>Google Mac Blog, &#8220;Taming Mac OS X File Systems&#8221;, January 11th, 2007, [Internet] <a
href="http://googlemac.blogspot.com/2007/01/taming-mac-os-x-file-systems.html">http://googlemac.blogspot.com/2007/01/taming-mac-os-x-file-systems.html</a> (Accessed June 3rd, 2007)</p><p>ICSE, &#8220;Future of Software Engineering&#8221;, 2007, [Internet] <a
href="http://web4.cs.ucl.ac.uk/icse07/index.php?id=104">http://web4.cs.ucl.ac.uk/icse07/index.php?id=104</a> (Accessed June 3rd, 2007)</p><p>Software Engineering Institute, Carnegie-Mellon University, &#8220;Ultra-Large-Scale (ULS) Systems &#8211; The Report&#8221;, July 2006, [Internet] <a
href="http://www.sei.cmu.edu/uls/">http://www.sei.cmu.edu/uls/</a> (Accessed June 3rd, 2007)</p><p>Salon.com, 2000 [Internet], &#8220;Why Bill Gates still doesn&#8217;t get the Net&#8221;, [Internet] <a
href="http://archive.salon.com/21st/books/1999/03/cov_30books.html">http://archive.salon.com/21st/books/1999/03/cov_30books.html</a> (Accessed June 3rd, 2007)</p><p>Sutter, Herb; &#8220;A Fundamental Turn Toward Concurrency in Software&#8221;, 2005, [Internet] <a
href="http://www.ddj.com/dept/architect/184405990">http://www.ddj.com/dept/architect/184405990</a> (Accessed June 3rd, 2007)</p><p>Sterling-Hoffman, &#8220;Opening Doors To Higher Education&#8221;, [Internet] <a
href="http://www.sterlinghoffman.com/newsletter/articles/article140.html">http://www.sterlinghoffman.com/newsletter/articles/article140.html</a> (Accessed June 3rd, 2007)</p><p>Wikipedia, &#8220;Thomas Kuhn&#8221; [Internet], <a
href="http://en.wikipedia.org/wiki/Thomas_Kuhn">http://en.wikipedia.org/wiki/Thomas_Kuhn</a> (Accessed June 3rd, 2007)</p> ]]></content:encoded> <wfw:commentRss>http://kosmaczewski.net/challenges-for-software-engineers/feed/</wfw:commentRss> <slash:comments>4</slash:comments> </item> <item><title>Django Architecture Approaches</title><link>http://kosmaczewski.net/django-architecture-approaches/</link> <comments>http://kosmaczewski.net/django-architecture-approaches/#comments</comments> <pubDate>Fri, 04 Apr 2008 11:59:12 +0000</pubDate> <dc:creator>Adrian</dc:creator> <category><![CDATA[Architecture]]></category> <category><![CDATA[Django]]></category> <category><![CDATA[Open Source]]></category> <category><![CDATA[Python]]></category> <guid
isPermaLink="false">http://kosmaczewski.net/?p=1141</guid> <description><![CDATA[I&#8217;ve just had a very interesting conversation with my colleague Marco about different approaches to the organization of code inside a Django application. As you might know (and if you don&#8217;t I&#8217;ll tell you anyway), Django&#8217;s views (somehow occupying the &#8230; <a
href="http://kosmaczewski.net/django-architecture-approaches/">Continue reading <span
class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<p>I&#8217;ve just had a very interesting conversation with my colleague <a
href="http://djangopeople.net/mbi/">Marco</a> about different approaches to the organization of code inside a Django application.</p><p>As you might know (and if you don&#8217;t I&#8217;ll tell you anyway), Django&#8217;s views (somehow occupying the &#8220;Controller&#8221; level in an MVC architecture) must take (at least) an <a
href="http://www.djangoproject.com/documentation/request_response/">HttpRequest</a> instance as a parameter and must return an <a
href="http://www.djangoproject.com/documentation/request_response/">HttpResponse</a> instance. <strong>That&#8217;s how it goes in Django, this is </strong><a
href="http://classics.mit.edu/Hippocrates/hippolaw.html"><strong>the law</strong></a><strong>.</strong> This means that you must be sure that the last instruction in your request processing code (in whichever way you&#8217;ve organized it) must return an HttpResponse instance, usually calling the HttpResponse() constructor (or of any of its useful subclasses), or by calling the django.shortcuts.render_to_response() function, or something similar.</p><p>This has, in my opinion, a major drawback: it might limit code reuse and it increases the coupling in the code. Everything&#8217;s not lost, however.<span
id="more-1141"></span></p><p>Before you start the flame wars, let me explain, using an <a
href="http://www.djangoproject.com/documentation/tutorial03/">example coming from the Django website</a>; this represents a basic Django view function, returning some response containing data fetched from the database:</p><p>[source:python]
from django.shortcuts import render_to_response, get_object_or_404</p><h1>&#8230;</h1><p>def detail(request, poll_id):
p = get_object_or_404(Poll, pk=poll_id)
return render_to_response(&#8216;polls/detail.html&#8217;, {&#8216;poll&#8217;: p})
[/source]</p><p>Let&#8217;s say now that I want to reuse that particular data (the &#8216;p&#8217; variable) in another view: given that the return value is always an HttpResponse instance, you are screwed; sometimes you just need the data, to find something, or simply to render it in another format like JSON or XML (RESTful architectures, anyone?). This goes pretty much against the <a
href="http://en.wikipedia.org/wiki/Don't_repeat_yourself">DRY</a> principles, and if you don&#8217;t go deeper than the Django tutorials, your whole application might feature lots of repeated code.</p><p>Even worse, you have a direct reference to a template (&#8220;polls/detail.html&#8221;), and this kind of coupling does not scale well. It can become a real problem in big projects.</p><p>There are, however, strategies to avoid this: the first, the most common, is to refactor your code and to create a &#8220;layer&#8221; of data-specific functions, which will return instances (or arrays thereof) that you can reuse here and there. Doing this in a big project already started requires a good deal of unit testing first, to ensure that your refactoring is not breaking something elsewhere, but that&#8217;s another problem (because you DO unit test, right??). This approach might not scale well in complex projects, and thus you would like to organize your code in other ways.</p><p>I learnt about organizing views using <a
href="http://docs.python.org/ref/callable-types.html">callable objects</a> instead of functions while studying the code in the <a
href="http://code.google.com/p/django-rest-interface/">Django REST Interface project</a>. In this case, you create <a
href="http://code.google.com/p/django-rest-interface/source/browse/trunk/django_restapi/resource.py">code like this</a>:</p><p>[source:python]
class Resource(ResourceBase):
&#8220;&#8221;"
Generic resource class that can be used for
resources that are not based on Django models.
&#8220;&#8221;"</p><pre><code># ... snip ...
def __call__(self, request, *args, **kwargs):
    """
    Redirects to one of the CRUD methods depending
    on the HTTP method of the request. Checks whether
    the requested method is allowed for this resource.
    """
    # Check permission
    if not self.authentication.is_authenticated(request):
        response = HttpResponse(_('Authorization Required'), mimetype=self.mimetype)
        challenge_headers = self.authentication.challenge_headers()
        response._headers.update(challenge_headers)
        response.status_code = 401
        return response
    try:
        return self.dispatch(request, self, *args, **kwargs)
    except HttpMethodNotAllowed:
        response = HttpResponseNotAllowed(self.permitted_methods)
        response.mimetype = self.mimetype
        return response
</code></pre><p>[/source]</p><p>The important bit here is the &#8220;<strong>call</strong>&#8221; method, which allows an instance to be called as a function, without specifying any particular method. This makes me remember of the dreadful <a
href="http://www.vbmigration.com/detknowledgebase.aspx?Id=309">VB default methods</a> but in Python it&#8217;s not that bad, actually (<a
href="http://kosmaczewski.net/2005/09/15/land-of-the-forbidden-maneuver/">VB is horrible by default</a> anyway), and allows you to use a cool syntax to do complex tricks (&#8220;command pattern&#8221; way of doing things, without the method call overload). And of course, since you are using an object-oriented approach, you can use polymorphism and inheritance to organize and reuse code as much as you can (or want).</p><p>Finally, Marco told me that his team uses another cool approach: they avoid returning HttpResponse instances from the views, and instead use <a
href="http://www.python.org/dev/peps/pep-0318/">Python decorators</a> to generate those. This way, you can achieve another neat separation of concerns, and you can reuse code simply and effectively.</p><p>I understand that the <a
href="http://en.wikipedia.org/wiki/Python_philosophy#Programming_philosophy">Python philosophy</a> cares about explicitness, but the &#8220;easy&#8221; way of processing requests in Django leads to trouble in big applications: increased coupling, reduced DRY, more headaches. I think you should use some code-reuse strategy in your Django code, but this, of course, is more an architectural problem than a Django problem.</p> ]]></content:encoded> <wfw:commentRss>http://kosmaczewski.net/django-architecture-approaches/feed/</wfw:commentRss> <slash:comments>11</slash:comments> </item> <item><title>About Operating Systems, Abstractions and APIs</title><link>http://kosmaczewski.net/about-os-abstraction-api/</link> <comments>http://kosmaczewski.net/about-os-abstraction-api/#comments</comments> <pubDate>Sat, 15 Dec 2007 15:59:08 +0000</pubDate> <dc:creator>Adrian</dc:creator> <category><![CDATA[Architecture]]></category> <category><![CDATA[Papers]]></category> <category><![CDATA[Software]]></category> <category><![CDATA[abstractions]]></category> <category><![CDATA[API]]></category> <category><![CDATA[Code]]></category> <guid
isPermaLink="false">http://kosmaczewski.net/2007/12/15/about-os-abstraction-api/</guid> <description><![CDATA[Introduction Charles Petzold, in its book &#8220;Code&#8221;, states the following: In theory, application programs are supposed to access the hardware of the computer only through the interfaces provided by the operating system. But many application programmers who dealt with small &#8230; <a
href="http://kosmaczewski.net/about-os-abstraction-api/">Continue reading <span
class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<p><strong>Introduction</strong></p><p>Charles Petzold, in its book &#8220;Code&#8221;, states the following:</p><blockquote>In theory, application programs are supposed to access the hardware of the computer only through the interfaces provided by the operating system. But many application programmers who dealt with small computer operating systems of the 1970s and early 1980s often bypassed the operating system, particularly in dealing with the video display. Programs that directly wrote bytes into video display memory ran faster than programs that didn&#8217;t. Indeed, for some applications &#8211; such as those that needed to display graphics on the video display &#8211; the operating system was totally inadequate. What many programmers liked most about MS-DOS was that it &#8216;stayed out of the way&#8217; and let programmers write programs as fast as the hardware allowed.</blockquote><p>(Charles Petzold, &#8220;Code&#8221;, pages 332 &amp; 333)</p><p>This paragraph shows the state of things during the MS-DOS &amp; early Windows versions timeframe (from late 1970s until 2000 approximately). During this time, programmers could directly access computer memory, bypassing the APIs offered by the operating system, and thus having total control of the hardware.</p><p>This shows two different trends in computer programming, one that respects the functionality offered by the operating system, and another that bypasses it. There are advantages and disadvantages to each approach, and the following paragraphs shows some of them. <span
id="more-1017"></span> <strong>The Conflict</strong></p><p>The Apple Macintosh (1984) and the NeXT computer (1989) were among the first systems to introduce a complete API (<a
href="http://en.wikipedia.org/wiki/API">Application Programming Interface</a>) that completely shielded application developers from directly accessing the hardware on which the application ultimately runs. In the case of the Apple Macintosh, this API could be programmed in Pascal, while for the NeXT it was using the Objective-C language.</p><p>The tradeoff between the MS-DOS approach and the API one can be resumed in three areas: <strong>performance, portability &amp; maintenance, and security.</strong> The first one, considered critical in the 80s, has been one of the major factors of the success of the &#8220;compatible IBM PC&#8221; + MS-DOS platform; similar applications could ran faster than in Macintosh environments; also, a bigger number of games (which heavily use graphics) was available for that platform, and this also led to a majority of people to choose it.</p><p>Direct access to the hardware brought a first problem, of maintenance and portability; indeed, software written this way was too hard to port to different operating systems or processor architectures, since it relied heavily in the availability of certain hardware interruptions and circuitry. While, on the other hand, Apple Macintosh software created for the first version of the Mac OS (1984) could run seamlessly, without recompilation (just copy the executable and double-click on it), until Mac OS 9 (1999).</p><p><strong>Microsoft Windows</strong></p><p>Regarding Microsoft Windows, the situation is slightly more complicated. Windows began its life as a GUI around MS-DOS in 1985, and from its version 95 it gradually became a more independent system, but never truly becoming a multi-threaded, multi-tasking operating system. This system used to support old MS-DOS software, allowing it to run natively:</p><blockquote>I first heard about this from one of the developers of the hit game SimCity, who told me that there was a critical bug in his application: it used memory right after freeing it, a major no-no that happened to work OK on DOS but would not work under Windows where memory that is freed is likely to be snatched up by another running application right away. (&#8230;) They reported this to the Windows developers, who disassembled SimCity, stepped through it in a debugger, found the bug, and added special code that checked if SimCity was running, and if it did, ran the memory allocator in a special mode in which you could still use memory after freeing it.
This was not an unusual case. The Windows testing team is huge and one of their most important responsibilities is guaranteeing that everyone can safely upgrade their operating system, no matter what applications they have installed, and those applications will continue to run, even if those applications do bad things or use undocumented functions or rely on buggy behavior that happens to be buggy in Windows n but is no longer buggy in Windows n+1. In fact if you poke around in the AppCompatibility section of your registry you&#8217;ll see a whole list of applications that Windows treats specially, emulating various old bugs and quirky behaviors so they&#8217;ll continue to work.</blockquote><p>(Joel Spolsky, 2004)</p><p>As you can see, direct hardware access is not only a problem for applications developers&#8230; it was one for Microsoft as well.</p><p>Simultaneously, <a
href="http://en.wikipedia.org/wiki/Windows_NT">Windows NT</a> was started in 1988 by another team, largely composed of engineers that worked in the Digital Equipment Corporation OpenVMS system. The NT kernel features, among several distinctive characteristics, the HAL (Hardware Abstraction Layer):</p><blockquote>A hardware abstraction layer (HAL) is an abstraction layer between the physical hardware of a computer and the software that runs on that computer. Its function is to hide differences in hardware and therefore provide a consistent platform to run applications on.</blockquote><p>(Wikipedia, 2006)</p><p>The NT&#8217;s HAL abstracts the complete hardware beneath, and the only way to access hardware functionality is through the API. The NT team approach was radically different to that of the &#8220;classic&#8221; Windows team; they would not fix compatibility issues application per application, but would rather define an API upfront and publish it to the developers.</p><p>The NT kernel has since replaced the older 9x kernel, from version XP onwards. This shift broke many old software packages, that are not able to run properly (if at all) in the new versions of Windows. Only those programs that use the <a
href="http://en.wikipedia.org/wiki/Windows_API">Windows API</a> exclusively are able to run properly (I have a copy of <a
href="http://en.wikipedia.org/wiki/Lotus_Improv">Lotus Improv 3.1</a>, bought in 1992, that I used to run under Windows 3.1, and that runs perfectly well under Windows XP&#8230;!)</p><p><strong>Present Situation</strong></p><p>Nowadays the &#8220;performance&#8221; characteristic named above has been replaced by the security concern; code that can access directly the computer hardware without permission checks (particularly in times of the Internet) can be potentially extremely dangerous: chapter 4 of the book &#8220;Hacking Exposed&#8221; (ISBN 0-0721-2127-0), by Joel Scambray, Stuart McClure and George Kurtz exposes tens of different vulnerabilities caused by direct hardware access known to Windows 95, 98 and ME &#8211; that is, the non-NT kernel. In other terms, such systems connected directly to the Internet are simply too vulnerable to be safely usable.</p><p>The conflict between calling an API or accessing the hardware directly today has been won by the API approach; consumer operating systems, at least, have taken this road and there is no turning back. The advantages are evident; the same source code base can be used to build the same application in several different platforms, lowering maintenance and support costs; platform vendors can improve performance and security reducing the impact in existing software packages; application developers can share knowledge, tips and tricks; security is built from the ground up.</p><p>On the other side, the growth in capacity of operating systems make limited resources to appear as unlimited: memory (using paging on disk), printers (using print queues) or even screen desktop space (using multiple desktops such as the <a
href="http://www.gnome.org/">GNOME</a> approach, or on-screen gadgets such as <a
href="http://www.apple.com/macosx/">Expose</a> on the Mac).</p><p><strong>Conclusion</strong></p><p>Currently there are APIs for graphic manipulation, such as <a
href="http://www.opengl.org/">OpenGL</a>, that allow software such as <a
href="http://earth.google.com/">Google Earth</a> to run in different hardware architectures using the same code base. This is possible thanks to the higher power of today&#8217;s hardware, and to the advances in operating system design, that make the overhead of API calls a non-significant portion of the overall CPU time needed to execute programs. Thus, the conflict has largely been resolved, in my opinion.</p><p><strong>References</strong></p><p>Charles Petzold, &#8220;Code &#8211; The Hidden Language of Computer Hardware and Software&#8221;, 2000, Microsoft Press, ISBN 0-7356-1131-9 (<a
href="http://www.charlespetzold.com/code/">website</a>)</p><p>Joel Spolsky, &#8220;How Microsoft Lost the API War&#8221;, June 13, 2004 [Internet], <a
href="http://www.joelonsoftware.com/articles/APIWar.html">http://www.joelonsoftware.com/articles/APIWar.html</a>, (Accessed December 14th, 2007)</p><p>Wikipedia, &#8220;Hardware abstraction layer&#8221; [Internet], <a
href="http://en.wikipedia.org/wiki/Hardware_abstraction_layer">http://en.wikipedia.org/wiki/Hardware_abstraction_layer</a> (Accessed December 14th, 2007)</p> ]]></content:encoded> <wfw:commentRss>http://kosmaczewski.net/about-os-abstraction-api/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>POSIX Device Files</title><link>http://kosmaczewski.net/posix-device-files/</link> <comments>http://kosmaczewski.net/posix-device-files/#comments</comments> <pubDate>Fri, 27 Jul 2007 10:50:36 +0000</pubDate> <dc:creator>Adrian</dc:creator> <category><![CDATA[Architecture]]></category> <category><![CDATA[Papers]]></category> <category><![CDATA[Technology]]></category> <guid
isPermaLink="false">http://kosmaczewski.net/2007/07/27/posix-device-files/</guid> <description><![CDATA[Introduction Modern operating systems provide a clear separation of the kernel processes from those running in user space, which prompts the question of how to access I/O devices from user processes, without breaking the above mentioned architectural separation, which guarantees &#8230; <a
href="http://kosmaczewski.net/posix-device-files/">Continue reading <span
class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<p><strong>Introduction</strong></p><p>Modern operating systems provide a clear separation of the kernel processes from those running in user space, which prompts the question of how to access I/O devices from user processes, without breaking the above mentioned architectural separation, which guarantees stability, security and performance.</p><p>Several approaches are available, depending on the level of abstraction used and the context of the problem. One of the solution is using POSIX Device Files, which indeed provide the same system-call interface for both files and devices. This article will describe the POSIX standard, the POSIX device files, and give a short enumeration of advantages and disadvantages of them. <span
id="more-909"></span> <strong>The POSIX Standard</strong></p><p>POSIX stands for &#8220;Portable Operating System Interface for Unix&#8221;, and it is the &#8220;collective name of a family of related standards specified by the IEEE to define the application programming interface (API) for software compatible with variants of the Unix operating system.&#8221; (Wikipedia). Nearly all of today&#8217;s most important operating systems, in either closed or open source form, are POSIX-compliant to a certain degree: all versions of Microsoft Windows, Apple Mac OS X, OpenVMS and Solaris, for example, are fully POSIX-compliant, while Linux, FreeBSD and Nucleos RTOS are partially compliant (Wikipedia). POSIX Conformance Test Suites (for example <a
href="http://www.itl.nist.gov/div897/ctg/posix_form.htm">the one provided by NIST</a>) are used to determine the level of POSIX-compliance of an operating system.</p><p>The importance of the POSIX standard is hard to assess; however, it is widely recognized that it has had an enormous impact in the computing world as we know it today:</p><blockquote>The overall Unix market (Figure 1) boasts billions of dollars in revenues, and is projected to continue growing into the future. Major areas for this growth feature high-end systems (data warehousing, servers, and supercomputing) rather than desktop applications. Brian Unter evaluated the impact of Posix standards in this area as responsible for up to 30% of Hewlett-Packard’s Unix business.</blockquote><p>(Isaak &amp; Johnson, 1998)</p><p>The POSIX standard provides the whole industry with a single, enormous and strong foundation, allowing interoperability, benchmarking and standardization, which in turn helps reducing costs, making the whole industry more efficient.</p><p><a
href="http://flickr.com/photos/uwehermann/85855177/"><img
src='http://kosmaczewski.net/wp-content/uploads/2007/07/85855177_cd67b80fdf.jpg' alt='85855177_cd67b80fdf.jpg' /></a></p><p><strong>POSIX Device Files</strong></p><p>One of the key aspects of the POSIX standard is the definition of a common gateway to access I/O devices from the user space, avoiding a direct and probably dangerous communication with the kernel, and known as POSIX Device Files: &#8220;These interfaces enable communication with serial, storage, and network devices through device files. In any UNIX-based system such as BSD, a device file is a special file located in /dev that represents a block or character device such as a terminal, disk drive, or printer. If you know the name of a device file (for example, disk0s2 or mt0) your application can use POSIX functions such as open, read, write, and close to access and control the associated device.&#8221; (Apple)</p><p>What does all of this mean? In short, this means that</p><blockquote>Under UNIX, every piece of hardware is a file. To demonstrate this, try view the file /dev/hda
less -f /dev/hda
/dev/hda is not really a file at all. When you read from it, you are actually reading directly from the first physical hard disk of your machine. /dev/hda is known as a device file, and all of them are stored under the /dev directory.</blockquote><p>(Sheer)</p><p><strong>Advantages</strong></p><p>The main advantage is homogeneity. Using POSIX device files, developers can manipulate, read and write data from external I/O devices without having to directly access the operating system kernel, using a unified programming model, with a standardized set of semantics, which leads to systems designed with elegance and correctness.</p><p>Another advantage is tightly related to the concept of &#8220;streams&#8221;, as understood by many higher-level programming languages. Streams allow developers to access devices in an uniform way; C++, Java, the .NET languages, and even dynamic languages such as Ruby or Lisp allow to treat sequences of bytes using this uniform concept:</p><blockquote>A stream is an object that can be used with an input or output function to identify an appropriate source or sink of characters or bytes for that operation</blockquote><p>(Lisp.org)</p><p>Java and .NET have big inheritance trees below the abstract Stream class, handling memory-based, file, network or I/O device-based streams, all of them sharing a similar structure and behavior, this means mainly the capability of being read and / or written in a sequential fashion.</p><p>This paradigm, coupled with the use of POSIX device files, allow software architects to create flexible designs, where the streams are treated polymorphically, can be exchanged at runtime as needed, and have lesser dependencies among them.</p><p><strong>Disadvantages</strong></p><p>The main disadvantage might be the maintainability of the code that uses these abstractions, since that these notation and semantics might not be immediately obvious to a newly graduated programmer.</p><p>The most common programming experience, in higher-level abstractions, usually depends on higher abstractions to access devices, for example using the System.IO.SerialPort classes in the .NET programming model:</p><blockquote>To reduce the programming effort that is required when working with serial ports, the .NET Compact Framework 2.0 includes the SerialPort class. The SerialPort class provides a simplified abstraction over serial communications ports that provides a number of features that simplify monitoring and configuring serial ports. The serial port also simplifies sending and receiving data with serial ports—including the automatic encoding and decoding of data sent to and received from the port</blockquote><p>(Microsoft)</p><p>Mac OS X also provides higher-level abstractions for managing devices:</p><blockquote>A device interface is a plug-in interface between the kernel and a process in user space. The interface conforms to the plug-in architecture defined by Core Foundation Plug-in Services (CFPlugIn), which, in turn, is compatible with the basics of Microsoft’s Component Object Model (COM). In the CFPlugIn model, the kernel acts as the plug-in host with its own set of well-defined I/O Kit interfaces, and the I/O Kit framework provides a set of plug-ins (device interfaces) for applications to use</blockquote><p>(Apple)</p><p>The abstraction-level problem also arises when porting code from an operating system to another, since a common abstraction to access a CD-ROM drive in Windows (usually &#8220;D:\&#8221;) is different from the way that Linux does (usually &#8220;/cdrom&#8221;) or the way that Mac OS X uses (which is using the friendly name of the CD-ROM under the &#8220;/Volumes&#8221; device). The use of POSIX device files might help avoiding these problems.</p><p><strong>Conclusion</strong></p><p>Both the advantages and disadvantages of using the same system-call interface are relative to the problem being solved, the available budget and the skills of the engineering team. There are huge advantages to stick to the POSIX standard, but there is a cost in maintainability and readability as well.</p><p><strong>References</strong></p><p>Apple Developer Connection, &#8220;I/O Kit Fundamentals, Architectural Overview: Controlling Devices From Outside the Kernel&#8221;, [Internet] <a
href="http://developer.apple.com/documentation/DeviceDrivers/Conceptual/IOKitFundamentals/ArchitectOverview/chapter_3_section_7.html">http://developer.apple.com/documentation/DeviceDrivers/Conceptual/IOKitFundamentals/ArchitectOverview/chapter_3_section_7.html</a> (Accessed February 11th, 2007)</p><p>Isaak, J.; Johnson, L.; &#8220;Posix/Unix standards: foundation for 21st century growth&#8221;, Micro, IEEE, Volume 18,  Issue 4,  July-Aug. 1998 Page(s):88, 87; Digital Object Identifier 10.1109/40.710874</p><p>Lisp.org, &#8220;System Class STREAM&#8221;, [Internet] <a
href="http://www.lisp.org/HyperSpec/Body/syscla_stream.html">http://www.lisp.org/HyperSpec/Body/syscla_stream.html</a> (Accessed February 11th, 2007)</p><p>Microsoft, &#8220;What&#8217;s New in the .NET Compact Framework 2.0&#8243;, [Internet] <a
href="http://msdn2.microsoft.com/en-us/library/aa446574.aspx">http://msdn2.microsoft.com/en-us/library/aa446574.aspx</a> (Accessed February 11th, 2007)</p><p>NIST, &#8220;PCTS:151-2, POSIX Test Suite&#8221;, [Internet] <a
href="http://www.itl.nist.gov/div897/ctg/posix_form.htm">http://www.itl.nist.gov/div897/ctg/posix_form.htm</a> (Accessed February 11th, 2007)</p><p>Sheer, P.; &#8220;UNIX devices&#8221;, [Internet] <a
href="http://www.cacs.louisiana.edu/~mgr/404/burks/linux/rute/node18.htm">http://www.cacs.louisiana.edu/~mgr/404/burks/linux/rute/node18.htm</a> (Accessed February 11th, 2007)</p><p>Wikipedia, &#8220;POSIX&#8221;, [Internet] <a
href="http://en.wikipedia.org/wiki/POSIX">http://en.wikipedia.org/wiki/POSIX</a> (Accessed February 11th, 2007)</p> ]]></content:encoded> <wfw:commentRss>http://kosmaczewski.net/posix-device-files/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>AOP &amp; The DataServices Project</title><link>http://kosmaczewski.net/aop-the-dataservices-project/</link> <comments>http://kosmaczewski.net/aop-the-dataservices-project/#comments</comments> <pubDate>Tue, 27 Mar 2007 07:49:22 +0000</pubDate> <dc:creator>Adrian</dc:creator> <category><![CDATA[Architecture]]></category> <category><![CDATA[Papers]]></category> <category><![CDATA[Software]]></category> <guid
isPermaLink="false">http://kosmaczewski.net/2007/03/27/aop-the-dataservices-project/</guid> <description><![CDATA[Introduction Five years ago I worked as a Software Engineer for a startup, based in Geneva, Switzerland, which had the goal of creating a web-based systems management console, to control and monitor the status of large computer installations, much like &#8230; <a
href="http://kosmaczewski.net/aop-the-dataservices-project/">Continue reading <span
class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<p><strong>Introduction</strong></p><p>Five years ago I worked as a Software Engineer for a startup, based in Geneva, Switzerland, which had the goal of creating a web-based systems management console, to control and monitor the status of large computer installations, much like <a
href="http://www.microsoft.com/smserver/">Microsoft SMS (Systems Management Server)</a> does. This tool would eventually benefit from being a web-based application, and as such it could be used from anywhere, without having to install a &#8220;fat client&#8221;; just launch a browser, point to a particular URL, and you are done.</p><p>During the project, I was able to work towards the creation of the first AJAX application I&#8217;ve ever seen (this was 2002!), and also, to use Aspect-Oriented Programming techniques for the first time.</p><p><a
href="http://flickr.com/photos/schani/72396562/"><img
src='http://kosmaczewski.net/wp-content/uploads/2007/03/72396562_c79cbd7ba0.jpg' alt='72396562_c79cbd7ba0.jpg' /></a> <span
id="more-755"></span> <strong>The Architecture</strong></p><p>This tool was designed as a three-systems project:</p><ol><li>On the server side, the back-end: it used &#8220;software spiders&#8221; that crawled your local network for information about the domain computers, using their <a
href="http://msdn.microsoft.com/library/en-us/wmisdk/wmi/wmi_start_page.asp">WMI (Windows Management Instrumentation) services</a> to gather useful information such as processor type, network card address, memory size, CPU usage, any kind of information. This information was sent to the server, where it was stored in a SQL Server 2000 database, for quicker retrieval and reporting.</li><li>On the client side, a web based application, running on top of IIS (Internet Information Services) and implemented as an ASP application (Active Server Pages). This tool was the first <a
href="http://en.wikipedia.org/wiki/AJAX">AJAX (Asynchronous JavaScript and XML)</a> application I had seen at the time (2002!), giving users a real-time experience of what was going on in the network; for example (this was impressive) you could just plug a new computer on the domain, and a couple of seconds later, the information would appear in your web browser automatically, without having to refresh the page (the client AJAX component polled the back-end every 5 or 10 seconds for updates).</li><li>In between both ends, the ASP application connected to the SQL Server backend using a <a
href="http://www.microsoft.com/com/">COM+</a> &#8220;proxy&#8221; component, which was the only means of communication between both subsystems.</li></ol><p>The development team was split in three teams, one working on the spider+database backend, and the other (where I was) working in the ASP client application. Finally, a highly-skilled C++ programmer had the task of building the &#8220;proxy&#8221; component that allowed both systems to communicate. This component was critical, and was designed for extreme high performance. Its design would deserve an article just for itself&#8230; just to mention, it featured transactional requests, complex event and feedback models and had extensive queuing capabilities, via MSMQ (Microsoft Messaging Queue).</p><p>Just for the record, regarding the methodologies used in the project, it is worth noting that the whole of the project was managed using the <a
href="http://www-306.ibm.com/software/awdtools/rup/">Rational Unified Process (RUP)</a>. It must be said that in the case of this startup company, this methodology was an &#8220;overkill&#8221;, and documentation was never kept in sync when changes were done to the architecture. Being a startup company required to be able to adapt the software as quickly as possible, and often this meant skipping the documentation process.</p><p><strong>The DataServices Project</strong></p><p>The COM+ component had extremely high throughoutput and small response times, but for the sake of interoperability, it required the ASP client application to send and consume complex XML requests and responses, using an industry standard called &#8220;CIM&#8221; (Computer Information Model). This XML standard, designed by the <a
href="http://www.dmtf.org/">DTMF (Distributed Task Management Force)</a> is part of the <a
href="http://www.dmtf.org/standards/wbem/">WBEM initiative</a> allowing for XML-based interoperability of computer management systems. It is worth noting that the Microsoft Systems Management Server (SMS) is based on this standard as well, and that the Windows WMI services are also based on these standards.</p><p>The problem, for the web applications team, was then to create long and complex XML requests to send to the COM+ proxy component. The web developers, already struggling with a complex AJAX-based UI, had real trouble to figure out the different parameters for these calls, and to create the proper XML structure (The CIM XML format is an extremely complex tree structure, with several layers of information and a high level of redundancy). This led to longer development cycles, since every new use case implementation required a good deal of new requests to be created to the server, with low reuse of already existing requests.</p><p>This is how the &#8220;DataServices&#8221; subproject was born: to create a layer of abstraction between the COM+ component and the dynamic HTML application. The goal was to provide a simpler way to connect the backend with the user interface, so that the development team in the UI layer could concentrate in the user experience, rather than in creating long XML documents. This was dragging development times, and was a well-known bottleneck for the project.</p><p>To avoid impacting the current ASP application, and to leverage XML capabilities, this project was created as an ASP.NET application, using the recently released .NET Framework, version 1.0 (released in February 2002) and the C# language.</p><p>This ASP.NET application would handle requests from the dynamic HTML application, but providing an API that would be much simpler for developers; for instance, instead of having to create long XML strings to retrieve the list of processes running in a given computer of the domain, the developers could just call &#8220;ComputerSystem.GetRunningProcesses(machineName)&#8221; (&#8220;facade&#8221; methods) and that would do it.</p><p><strong>Aspect-Oriented Programming</strong></p><p>Once the DataServices subproject began growing, we noticed that we were performing the same operations everywhere, for each &#8220;facade&#8221; method call:</p><ul><li>Deserialization of requests</li><li>Security checks</li><li>Instrumentation (mainly logging, but also performance management and throughoutput)</li><li>Back-end connection management (opening, pooling, closing)</li><li>Error management</li><li>Serialization of responses</li></ul><p>The same lines of code were repeated around almost every internal operation done by the DataServices infrastructure, but this code could not be properly encapsulated in an external component, to be reused instead of duplicated.</p><p>More or less at the same time, one of the team members read a paper about the newly-born idea of <a
href="http://www.aosd.net/">&#8220;Aspect Oriented Programming&#8221; (AOP)</a>, and about the basic AOP capabilities of the .NET Framework (Dharma, Fell and Sells, 2002).</p><p>Gregor Kiczales from Xerox PARC, one of the creators of AOP, explains that</p><blockquote> &#8220;We have found many programming problems for which neither procedural nor object-oriented programming techniques are sufficient to clearly capture some of the important design decisions the program must implement.  This forces the implementation of those design decisions to be scattered throughout the code, resulting in “tangled” code that is excessively difficult to develop and maintain.  We present an analysis of why certain design decisions have been so difficult to clearly capture in actual code.  We call the properties these decisions address aspects, and show that the reason they have been hard to capture is that they cross-cut the system’s basic functionality. We present the basis for a new programming technique, called aspect-oriented programming, that makes it possible to clearly express programs involving such aspects, including appropriate isolation, composition and reuse of the aspect code.&#8221;</blockquote><p>(Kiczales et al., 1997)</p><p>The following two images highlight the location, in the Apache Tomcat project, of different sections of code that handle URL pattern matching and logging:</p><p><img
src='http://kosmaczewski.net/wp-content/uploads/2007/03/aop_modularized.png' alt='aop_modularized.png' /></p><p>(Source: Hilsdale, Kiczales, 2003)</p><p><img
src='http://kosmaczewski.net/wp-content/uploads/2007/03/aop_nonmodularized.png' alt='aop_nonmodularized.png' /></p><p>(Source: Hilsdale, Kiczales, 2003)</p><p>The idea of AOP is to provide a mechanism to centralize these lines of code in a single location, which would provide better management and maintainability in the future; Jacobson further explains that</p><blockquote> &#8220;Even though peer use cases are separate during requirements, their separation is not preserved during implementation. The realization of a use case touches many classes (scattering), and a class contains pieces of several use-case realizations (tangling). As a serious consequence, the realization of each use case gets dissolved in a sea of classes.&#8221;</blockquote><p>(Jacobson, Ng, 2005)</p><p><img
src='http://kosmaczewski.net/wp-content/uploads/2007/03/aop_usecase.png' alt='aop_usecase.png' /></p><p>(Source: Jacobson, Ng, 2005, page 33)</p><p>Regarding DataServices, suffice to say that AOP seemed to us as a good solution for our problem. Tempted by the discovery of a new paradigm, we decided to give a try to AOP, and the results proved to be worth the effort. In our case, we intercepted each message sent to the classes implementing the use cases (like the ComputerSystem.GetRunningProcesses(machineName) example above), &#8220;injecting&#8221; behavior before and after the associated method calls, providing the services enumerated above (instrumentation, error handling, etc).</p><p>Using DataServices, you could define methods like this:</p><p>[source:c#]
[Log()]
[RequiresRole(Role.Admin)]
[Database(ConnectionType.SQLServer)]
public bool CreateRecord([RegExp("[a-z]*&#8221;)] string name, [Range(1, 99)] int age)
{
// just open the connection and insert, no further checking needed!
}
[/source]</p><p>As you can see the CreateRecord method contains .NET attributes that define the valid ranges of execution for the parameters, and has some other attributes that define pre- and post-conditions to be checked prior to execution. This allowed us to centrally manage a quite large framework (~20 classes, ~100 quite complex methods) and centralize the debugging, security, logging and parameter checking routines into a single location.</p><p>Moreover, using configuration files, one was able to &#8220;inject&#8221; (&#8220;weave&#8221;, using AOP terminology) more behaviors into this mechanism without having to recompile the application (this is known nowadays as &#8220;Dependency Injection&#8221; or &#8220;Inversion of Control&#8221; &#8211; Fowler, 2004), while the core &#8220;facade&#8221; methods only contained specific instructions related to the business rules that they implemented. The code was lighter, easier to read and maintain, and largely self-documenting.</p><p>It is worth noting, however, that we used a very small subset of the functionality that AOP provides, that is, the runtime interception of messages sent to objects and weaving aspects before and after method execution:</p><blockquote> &#8220;Another way to understand AOP is in terms of how it works. A common misconception associated with this perspective is to equate all of AOP with just one part of the supporting mechanisms. This kind of error is analogous to saying that OOP is just abstract data types.
Probably the most common mechanism error is to equate AOP with interceptors.(&#8230;)
It’s true that AOP does use functionality like interceptors and Lisp advise. It also incorporates techniques from reflection, multiple inheritance, multi-methods and others. However, AOP has an explicit focus on crosscutting structure and the modularization of crosscutting concerns.
The rule of thumb to avoid mechanism errors? Remember that AOP is more than any one mechanism—it’s an approach to modularizing crosscutting concerns that’s supported by a variety of mechanisms, including pointcuts, advice and introduction.&#8221;</blockquote><p>(Kiczales, 2004)</p><p>It is also interesting to note some drawbacks about our approach:</p><ol><li><strong>Performance:</strong> the runtime-based interceptions of method calls was heavy, adding a 10-factor more processing time at each method execution. However, since our system was not designed for high number of users or transactions per second, this was not  a showstopper, at least not in that phase of the project.</li><li><strong>Debugging:</strong> the Visual Studio .NET 2002 debugger (used at the beginning of the project) had some trouble figuring out the &#8220;jumps&#8221; between the &#8220;normal&#8221; methods and the AOP code (which we called &#8220;hyperspace&#8221; in our jargon, since the call stack of the weaved code seemed to appear and disappear completely, almost magically, while debugging). The workaround was to set up breakpoints in strategic places around the code.</li><li><strong>Microsoft support:</strong> while .NET provides basic AOP capabilities, such as message interception and context-bound objects, these features were undocumented and explicitly non-supported; however, the code created back in 2002 still compiles perfectly well in later versions of the .NET Framework.</li></ol><p><strong>Conclusion</strong></p><p>AOP helped us to have an extremely high productive environment, and in only 4 months we had a working system (4 people were working full-time in the project), providing an enourmous amount of functionality, and enhancing the current system.</p><p>Unfortunately, though, the project could not be finished completely, since the venture capital group that financed the company cut all financing unexpectedly and all development tasks stopped in 2003. Nevertheless, it was a highly rewarding experience, that gave me a practical insight into complex OOP, AOP and architectural design, as well as into project management techniques.</p><p><strong>References</strong></p><p>Fowler, Martin, &#8220;Inversion of Control Containers and the Dependency Injection pattern&#8221; [Internet] <a
href="http://www.martinfowler.com/articles/injection.html">http://www.martinfowler.com/articles/injection.html</a> (Last accessed July 2nd, 2006)</p><p>Hilsdale, Erik; Kiczales, Gregor et al., &#8220;Aspect-Oriented Programming with AspectJ™&#8221;, Xerox Corporation, <a
href="http://www.ccs.neu.edu/research/demeter/course/w03/lectures/lecAspectJ-w03.ppt">http://www.ccs.neu.edu/research/demeter/course/w03/lectures/lecAspectJ-w03.ppt</a> (Last accessed July 2nd, 2006)</p><p>Jacobson, Ivar; Ng, Pan-Wei, &#8220;Aspect-Oriented Software Development with Use Cases&#8221;, ISBN 0-321-26888-1, Addison-Wesley, 2005.</p><p>Kiczales, G., Lamping, J., Mendhekar, A., Maeda, C., Videira Lopes, C., Loingtier, J.-M., and Irwin, J, &#8220;Aspect Oriented Programming&#8221;, Springer-Verlag, 1997, [Internet] <a
href="http://www.cs.ubc.ca/~gregor/papers/kiczales-ECOOP1997-AOP.pdf">http://www.cs.ubc.ca/~gregor/papers/kiczales-ECOOP1997-AOP.pdf</a> (Last accessed July 2nd, 2006)</p><p>Kiczales, Gregor, &#8220;Common Misconceptions&#8221;, Dr. Dobb&#8217;s Magazine, February 10th, 2004, [Internet] <a
href="http://www.ddj.com/showArticle.jhtml?articleID=184415113">http://www.ddj.com/showArticle.jhtml?articleID=184415113</a> (Last accessed July 2nd, 2006)</p><p>Shukla, Dharma; Fell, Simon; and Sells, Chris, &#8220;Aspect-Oriented Programming Enables Better Code Encapsulation and Reuse&#8221;, MSDN Magazine, March 2002 [Internet] <a
href="http://msdn.microsoft.com/msdnmag/issues/02/03/aop/">http://msdn.microsoft.com/msdnmag/issues/02/03/aop/</a> (Last accessed July 2nd, 2006)</p> ]]></content:encoded> <wfw:commentRss>http://kosmaczewski.net/aop-the-dataservices-project/feed/</wfw:commentRss> <slash:comments>3</slash:comments> </item> <item><title>Reducing Code Entropy</title><link>http://kosmaczewski.net/reducing-code-entropy/</link> <comments>http://kosmaczewski.net/reducing-code-entropy/#comments</comments> <pubDate>Sun, 18 Mar 2007 09:27:20 +0000</pubDate> <dc:creator>Adrian</dc:creator> <category><![CDATA[Architecture]]></category> <category><![CDATA[Opinion]]></category> <category><![CDATA[C++]]></category> <category><![CDATA[Code]]></category> <guid
isPermaLink="false">http://kosmaczewski.net/2007/03/18/reducing-code-entropy/</guid> <description><![CDATA[This is a rant: I am tired of seeing virtual methods implemented in child classes that, at some point or another, call the method of the same name in the base class. For me this is a sign of poor &#8230; <a
href="http://kosmaczewski.net/reducing-code-entropy/">Continue reading <span
class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<p>This is a rant: <strong>I am tired of seeing virtual methods implemented in child classes that, at some point or another, call the method of the same name in the base class.</strong> For me this is a sign of poor architecture. <a
href="http://c2.com/xp/CodeSmell.html">A bad, bad smell in code</a>.</p><p>Let&#8217;s say that you have a base class, called, well, BaseClass (the examples are in C++, but you can take this idea to any other OO language):</p><p>[source:c]</p><p>class BaseClass
{
public:
// constructor, destructor, copy
// constructor, assignment operator&#8230;</p><pre><code>virtual void doSomething();
</code></pre><p>}</p><p>void BaseClass::doSomething()
{
// provide some basic behavior here,
// and a comment that says something like:</p><pre><code>// IMPORTANT NOTE to implementors of subclasses:
// this method should always be called before your code!
</code></pre><p>}</p><p>[/source]</p><p>And then you derive a class from this base class, called, well, DerivedClass:</p><p>[source:c]</p><p>class DerivedClass : public BaseClass
{
public:
// constructor, destructor, copy
// constructor, assignment operator&#8230;</p><pre><code>virtual void doSomething();
</code></pre><p>}</p><p>void DerivedClass::doSomething()
{
BaseClass::doSomething();    // WTF???</p><pre><code>// do something else...
</code></pre><p>}</p><p>[/source]</p><p>As stupid as it might sound, this code is hard to understand, debug and maintain, because it includes an unneeded dependency from the child to the base class &#8211; that is, another one, besides the fact that one derives from the other!</p><p><span
id="more-742"></span>The problem is that the &#8220;doSomething()&#8221; message appears here as actually <strong>two messages</strong>: one which is mandatory and generic (like initialization or verification), and the other, a specialized one (which is why people use polymorphism, actually). One is public, the other is not. Mixing both messages creates a semantic problem in the above code, which ultimately makes the code harder to understand.</p><p>This might lead a myriad of other small things; what if another developer (just landed in the team), months later has to add a new child class of BaseClass, and forgots to (or just does not know that she should) call the BaseClass::doSomething() method? This might cause pain, either because it generates a bug that&#8217;s hard to track, either because the developer spends time trying to understand why the new class does not work while she&#8217;s working on it. In any case, it is a cost that you, as an architect, might have avoided in the first place. And if your designs are poorly documented (which is a sad truth), or if the developer that created the code has gone away from the team (which happens a heckuva lot of times every day), well, your chances of losing time and effort in your project will grow accordingly.</p><p>In my opinion, such an approach goes against the very idea of polymorphism, and I prefer to refactor such a code this way:</p><p>[source:c]</p><p>class BaseClass
{
public:
// constructor, destructor, copy
// constructor, assignment operator&#8230;</p><pre><code>void doSomething();
</code></pre><p>protected:
// a pure virtual, abstract method
// without implementation, working as a
// &#8220;hook&#8221; for subclass implementors
virtual void doSomethingElse() = 0;
}</p><p>void BaseClass::doSomething()
{
// provide some mandatory, basic behavior here</p><pre><code>// hook for subclasses to extend this behavior!
doSomethingElse();
</code></pre><p>}</p><p>[/source]</p><p>And then:</p><p>[source:c]</p><p>class DerivedClass : public BaseClass
{
public:
// constructor, destructor, copy
// constructor, assignment operator&#8230;</p><p>protected:
// provide some extended behavior!
virtual void doSomethingElse();
}</p><p>void DerivedClass::doSomethingElse()
{
// do something else&#8230;
}</p><p>[/source]</p><p>Ahhh&#8230; I much more prefer this way of doing things:</p><ol><li>First of all, it makes BaseClass an abstract entity, an interface, a pure thought creation, a divine entity above all the earthling vacuum; you want to program against interfaces, and not be tempted to use them as concrete things. That&#8217;s what subclasses are for.</li><li>The call to doSomething() will always, always (let me repeat that, <strong>always</strong>) be called, no matter how sloppy the creator of DerivedClass is. The worst thing you can have is an empty DerivedClass::doSomethingElse() implementation. In that case, no harm.</li><li>If the developer is so, so, so sloppy that it forgets to implement DerivedClass::doSomethingElse(), well, guess what: at some point in time, either the compiler or the linker will complain! And this is a <strong>good thing</strong> (at least in languages that feature a compiler, of course!)</li><li>You&#8217;ve unknowingly created a nice API, by the way. You can document this with a good UML diagram, and it will stand out in your Doxygen or JavaDoc documentation.</li><li>It&#8217;s easy to understand and hard to break, as all good designs should be.</li><li>Your developers will love your work, and your karma gets a bonus point. The next life you might as well approach a higher level of consciousness and save Mankind from its horrible destiny. (OK, maybe this is too much, but hey, who knows?)</li><li>The approach is called the <a
href="http://www.exciton.cs.rice.edu/JavaResources/DesignPatterns/TemplatePattern.htm">&#8220;Template Design Pattern&#8221;</a> and this will add another bonus, but this time for your CV    ;)</li></ol><p>Hope this helps! Any comments welcome, as usual.</p><p><a
href="http://flickr.com/photos/nothingpersonal/252531721/"><img
src='http://kosmaczewski.net/wp-content/uploads/2007/03/252531721_c9d8aef994.jpg' alt='252531721_c9d8aef994.jpg' /></a></p> ]]></content:encoded> <wfw:commentRss>http://kosmaczewski.net/reducing-code-entropy/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Hardware Polymorphism</title><link>http://kosmaczewski.net/hardware-polymorphism/</link> <comments>http://kosmaczewski.net/hardware-polymorphism/#comments</comments> <pubDate>Sat, 08 Apr 2006 10:43:29 +0000</pubDate> <dc:creator>Adrian</dc:creator> <category><![CDATA[Architecture]]></category> <category><![CDATA[Opinion]]></category> <category><![CDATA[Papers]]></category> <guid
isPermaLink="false">http://kosmaczewski.net/2006/04/08/hardware-polymorphism/</guid> <description><![CDATA[Since data and instructions are stored in RAM in pretty much the same way, a priori the CPU cannot distinguish each other, but by the cycle in which the binary chunk is fetched from memory. In the case of instructions, &#8230; <a
href="http://kosmaczewski.net/hardware-polymorphism/">Continue reading <span
class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<p>Since data and instructions are stored in RAM in pretty much the same way, <em>a priori</em> the CPU cannot distinguish each other, but by the cycle in which the binary chunk is fetched from memory. In the case of instructions, it then needs to decode the operation codes into instructions, with the added problem that if the operation is performed on data that is not implied by the operation code, the results are wrong or even catastrophic.</p><p>The question is: would it be useful if in hardware, each cell of data would carry its own type designation? I will discuss here the pros and cons of this approach, in respect to hardware and software architectures. <span
id="more-126"></span></p><h2>Introduction</h2><p>The example of the + sign is particularly interesting; here&#8217;s an excerpt of a tutorial for the Ruby programming language, where the need for data types appears in a straightforward way:</p><blockquote><em> &#8220;Before we get any further, we should make sure we understand the difference between numbers and digits. 12 is a number, but &#8217;12&#8242; is a string of two digits.
Let&#8217;s play around with this for a while:
puts  12  +  12
puts &#8217;12&#8242; + &#8217;12&#8242;
puts &#8217;12  +  12&#8242;
(results)
24
1212
12  +  12
How about this:
puts  2  *  5
puts &#8217;2&#8242; *  5
puts &#8217;2  *  5&#8242;
(results)
10
22222
2  *  5&#8243; </em></blockquote><p>(Chris Pine, 2006)</p><p>As we can see in the above example, higher-level programming languages allow us to distinguish (if not explicitly like Java, contextually like Ruby) among different types of information, whereas, at hardware level, this distinction does not exist; the processor executes or processes instructions or data depending on the processor cycle.</p><h2>Metadata</h2><p>In other words, if meaningful information (data) is stored in the memory of the computer and can be processed by a computer program, then we can say that every bit (no pun intended) of information has, at least at a certain abstraction level, and from a certain point of view, a particular type, or, more generally, some metadata attached to it:</p><blockquote><em> &#8220;Metadata (Greek: meta- + Latin: data &#8220;information&#8221;), literally &#8220;data about data&#8221;, is information that describes another set of data. A common example is a library catalog card, which contains data about the contents and location of a book: It is data about the data in the book referred to by the card. Other common contents of metadata include the source or author of the described dataset, how it should be accessed, and its limitations.&#8221; </em></blockquote><p>(Wikipedia, 2006)</p><p>In our case, the type of a particular piece of data is part of its metadata:</p><blockquote><em> &#8220;Assigning datatypes (&#8220;typing&#8221;) has the basic purpose of giving some semantic meaning to otherwise meaningless collections of bits.&#8221; </em></blockquote><p>(Wikipedia, 2006)</p><h2>(Imaginary) Type-checking processor architecture</h2><p>Now, let&#8217;s suppose that a certain processor architecture allows us to distinguish in-memory pieces of data from in-memory program instructions. How could this be implemented?</p><p>First of all, let&#8217;s see the different primitive types of information that a processor could distinguish:</p><ul><li>Integer numbers (of different but defined lengths such as 8, 16, 32 or 64 bits)</li><li>Floating-point numbers (again, of different but defined lengths)</li><li>Single characters (single- or multibyte-characters, such as Unicode ones)</li><li>Strings (of variable lengths)</li><li>Pure binary streams (images, audio, video)</li><li>Uniform arrays or vectors (of variable length, where the items are all of the same type)</li><li>Variable arrays or vectors (of variable length, where the individual items can be of any type, similar to C structures)</li></ul><p>Let&#8217;s imagine, to begin, that this is the definitive list of supported types at hardware level, by a certain microprocessor architecture. I have kept this list particularly close to that of any common high-level language, for reasons that will become obvious in a while.</p><p>How could the type metadata be stored at hardware level? The easiest way to imagine this is having a supplemental byte at the beginning of each in-memory variable or structure, indicating the type. This would give us the possibility of referencing 256 different types of data, which is more than enough in this particular example.</p><p>In the case of variable length data types as shown above (String, Arrays) another byte (or bytes) should indicate the length of the whole data structure. The need for this will be explained below.</p><h2>Type-checking</h2><p>Now, during the execution type, the processor would fetch data from memory but this time, it would have a first byte of information about the information (metadata) indicating the type of what follows, and eventually some length information as well. Instead of having to rely in context (which is the case by now), the processor could proactively check that the data will be processed by the appropriate instructions. This is a common technique used in programming languages called &#8220;Type Checking&#8221;:</p><blockquote><em> &#8220;The process of verifying and enforcing the constraints of types &#8211; type checking &#8211; may occur either at compile-time (a static check) or run-time (a dynamic check). Static type-checking becomes a primary task of the semantic analysis carried out by a compiler. If a language enforces type rules strongly (that is, generally allowing only those automatic type conversions which do not lose information), one can refer to the process as strongly typed, if not, as weakly typed.&#8221; </em></blockquote><p>(Wikipedia, 2006)</p><p>In the case of a processor-based type check, it would be a pure strong, dynamic one.</p><p>Of course, this introduces the first drawback of this approach; while current processor architectures bypass this check and blindly trust the &#8220;context coherence&#8221; between data and instruction, the processor would have to execute an internal check to verify them prior to executing the instruction. A smarter approach would be to have the processor to &#8220;trust&#8221; some executing code, if it comes from a statically-typed compiler, for example; this way, the processor would not execute the type check, and would have a similar behavior as that from current systems; however, it would execute a supplemental check for code coming from dynamically-typed languages (such as scripting languages).</p><h2>Security</h2><p>A direct benefit of type-checking processor architectures such as the one described above has to do with security. One of the most common security problems in software today is inherent to the Von Neumann architecture, in which data and instructions are both loaded in memory and share adjacent locations. This security problem is known as &#8220;Buffer Overrun&#8221; or &#8220;Stack Overrun&#8221;:</p><blockquote><em> &#8220;A stack-based buffer overrun occurs when a buffer declared on the stack is overwritten by copying data larger than the buffer. Variables declared on the stack are located next to the return address for the function&#8217;s caller. The usual culprit is unchecked user input passed to a function such as strcpy, and the result is that the return address for the function gets overwritten by an address chosen by the attacker. In a normal attack, the attacker can get a program with a buffer overrun to do something he considers useful, such as binding a command shell to the port of their choice&#8221;. </em></blockquote><p>(Howard &amp; LeBlanc, 2003, page 129)</p><p>(Howard &amp; LeBlanc follow this statement with a C program that shows the security failure, and explain how it might be exploited to inject code in the computer program and change its behavior)</p><p>In the case of the stack buffer overrun, the problem is not only that current processors do not check the type of the data to process, but they do not even check the length of it. Length-checks should be then the first check that a processor should do before processing in-memory data of variable length (as stated above, Strings, Arrays and structures like C structs fall in this category).</p><p>This leads to infere that a type-checker processor would be particularly useful in security intensive environments (such as nuclear plants, life support systems, etc) where the tradeoff of performance for an additional security type check could be highly desirable.</p><h2>Virtual Machines</h2><p>Virtual machines such as Java or .NET&#8217;s Common Language Runtime (CLR) both perform length and type checks; in those cases, the virtual machine specification ensure that the code being executed does not perform illegal memory accesses or reference objects of the wrong type at the wrong moment.</p><p>For example, .NET&#8217;s CLR includes a runtime security engine that constantly checks code metadata, to know whether the method calls are trusted or not:</p><blockquote><em> &#8220;Permission demands propagate up the stack. When a method call demands a particular type of permission, the security engine must affirm that every component in the stack (prior to the point of the permission demand) has appropriate permissions. If any component does not, the permission demand fails and an exception is thrown to signify this failure. Each frame of the stack can modify the effective set of permissions by calling Assert, Deny or PermitOnly before making calls, and there are also calls to Revert changes made earlier. Taken together, this mechanism results in aggregate behavior that is constrained by the least privileged component that is participating in a given stack region&#8221; </em></blockquote><p>(Stutz, Neward &amp; Shilling, 2003, page 185)</p><p>This, among other reasons (such as automatic memory management) make virtual machines a &#8220;hot topic&#8221; in computing nowadays, since they allow to develop much more secure systems, with greater productivity, with fewer resources.</p><p>Of course, it must be said, not all security problems have disappeared with virtual machines, but that&#8217;s another topic.</p><h2>Other benefits</h2><p>I think that another performance benefit could come from the fact of having native string manipulation at hardware level. String manipulation is by far the most common operation performed in high level programming languages (where Perl and Basic are the most common examples), but yet until now text strings as such exist only at that high level (and even until recently, when the Standard Type Library appeared, C++ did not even have a native string type &#8211; http://www.bgsu.edu/departments/compsci/docs/string.html).</p><p>Common string operations that could be implemented at hardware level include string copying, string concatenation and splitting; this way, getting the length or a defined substring of a given string would need a single processor instruction, instead of the current procedures, that imply memory allocation and copying, both extremely expensive in time and resources:</p><blockquote><em> &#8220;In all cases, these functions consist of copying all or a subset of a string to another string. The specific steps are:<ol><li>Determine the number of characters to copy</li><li>Allocate space for the characters</li><li>Copy the characters to the new string</li></ol> Because of the memory allocation and copying operations involved, extracting sub-strings is also an expensive operation.&#8221; </em></blockquote><p>(VBIP.com, 2006)</p><h2>Conclusion</h2><p>The inclusion of high-level instructions in processors is not something new. It allows to boost the speed of hardware architectures, providing common operations to be performed at maximum speed at the lowest system level. One good example of existing implementations is the Velocity Engine existing in PowerPC G4 and G5 microprocessors:</p><blockquote><em> &#8220;The Velocity Engine, embodied in the G4 and G5 processors, expands the current PowerPC architecture through addition of a 128-bit vector execution unit that operates concurrently with existing integer and floating-point units. This provides for highly parallel operations, allowing for simultaneous execution of up to 16 operations in a single clock cycle. This new approach expands the processor&#8217;s capabilities to concurrently address high-bandwidth data processing (such as streaming video) and the algorithmic intensive computations which today are handled off-chip by other devices, such as graphics, audio, and modem functions.The AltiVec instruction set allows operation on multiple bits within the 128-bit wide registers. This combination of new instructions, operation in parallel on multiple bits, and wider registers, provide speed enhancements of up to 30x on operations that are common in media processing&#8221; </em></blockquote><p>(Apple Computer, 2006)</p><p>Some example code in C that uses the AltiVec instruction set is shown in <a
href="http://developer.apple.com/hardware/ve/tutorial.html">http://developer.apple.com/hardware/ve/tutorial.html</a></p><p>Of course, these implementations greatly impact compiler and operating systems design, but they do not (I think) impact higher-level languages such as Java, C#, or scripting languages such as Perl or Ruby, who tend to be rather platform-independent (both software and hardware).</p><h2>References</h2><p>Apple Computer, &#8220;Velocity Engine&#8221; [Internet], <a
href="http://developer.apple.com/hardware/ve/">http://developer.apple.com/hardware/ve/</a> (Accessed February 3rd, 2006)</p><p>Chris Pine, &#8220;Learn to Program&#8221; [Internet], <a
href="http://pine.fm/LearnToProgram/?Chapter=02">http://pine.fm/LearnToProgram/?Chapter=02</a> (Accessed February 3rd, 2006)</p><p>David Stutz, Ted Neward &amp; Geoff Shilling, &#8220;Shared Source CLI Essentials&#8221;, ISBN 0-596-00351-X, O&#8217;Reilly, 2003</p><p>Michael Howard &amp; David LeBlanc, &#8220;Writing Secure Code, 2nd Edition&#8221;, ISBN 0-7356-1722-8, Microsoft Press, 2003</p><p>VBIP.com, &#8220;String Operations&#8221; [Internet], <a
href="http://www.vbip.com/books/1861007302/chapter_7302_04.asp">http://www.vbip.com/books/1861007302/chapter_7302_04.asp</a> (Accessed February 3rd, 2006)</p><p>Wikipedia, &#8220;Datatype&#8221; [Internet], <a
href="http://en.wikipedia.org/wiki/Datatype">http://en.wikipedia.org/wiki/Datatype</a> (Accessed February 3rd, 2006)</p><p>Wikipedia, &#8220;Metadata&#8221; [Internet], <a
href="http://en.wikipedia.org/wiki/Metadata">http://en.wikipedia.org/wiki/Metadata</a> (Accessed February 3rd, 2006)</p> ]]></content:encoded> <wfw:commentRss>http://kosmaczewski.net/hardware-polymorphism/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Michael Platt&#8217;s definition of Architecture</title><link>http://kosmaczewski.net/michael-platts-definition-of-architecture/</link> <comments>http://kosmaczewski.net/michael-platts-definition-of-architecture/#comments</comments> <pubDate>Tue, 28 Mar 2006 20:02:44 +0000</pubDate> <dc:creator>Adrian</dc:creator> <category><![CDATA[Architecture]]></category> <guid
isPermaLink="false">http://kosmaczewski.net/2006/03/28/michael-platts-definition-of-architecture/</guid> <description><![CDATA[In his weblog, Michael Platt has posted an interesting article, comparing different definitions for the word &#8220;Architecture&#8221;; interesing read! http://blogs.technet.com/michael_platt/archive/2006/03/27/423300.aspx]]></description> <content:encoded><![CDATA[<p>In his weblog, Michael Platt has posted an interesting article, comparing different definitions for the word &#8220;Architecture&#8221;; interesing read! <a
href="http://blogs.technet.com/michael_platt/archive/2006/03/27/423300.aspx">http://blogs.technet.com/michael_platt/archive/2006/03/27/423300.aspx</a></p> ]]></content:encoded> <wfw:commentRss>http://kosmaczewski.net/michael-platts-definition-of-architecture/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> </channel> </rss>
