Java Embryo System

(a.k.a. jEmbryoS)

The operating system 100% written in Java


 The goal
 Problem definition
 The Ideal System
 Steps to be taken
 Optimizing compiler
 Architecture improvements
 Drivers
 Auxiliary software
 Hardware
 Potential usage
 Business target (executive summary)
 Architecture
 High level view
 The system layer
 The service layer
 The application layer
 100% Java
 Data structures
 Method call implementation
 Memory management
 Memory allocation
 Garbage collection
 Hardware management
 Interrupts
 Driver architecture
 Task management
 Security
 Java API compliance
 Technologies
 Assembler
 Compiler
 Bootloader
 System image
 Debugging
 Testing
 Results achieved
 Projects and dependencies
 Performance and limitations
 System installation
 Console application usage
 How to extend
 Quality expectation disclaimer
It was a vague feeling of discomfort and may be some sort of frustration when many things around was just refusing to do what they should. And one of such moments gave birth to the Embryo. But what the problem ? If the world isn't ideal, it is not an excuse for a great fuss ! May be, but ...

The goal

Why the world can't be better ? My thought is - it should be. And if there is a problem, then it should be eliminated as soon as possible. But it's not easy. Why ? Just because we, the people, refuse to refactor our world in such a manner as many programmers do every day. And here is the goal - at least in the area of software systems things should not bother you. And a good starting point for software things is an operating system, isn't it ?

In short: the ultimate point of a success of the proposed operating system is a production ready version with number of advantages achieved. The details about it's capabilities will be given a bit later.

Problem definition

Why people do not refactor their world ? And there is answer ready to use - the world is too complex. But why it is so complex ? My bet is simple - it's a great legacy slum that prevents us from doing many things right. Now here and there and everywhere we hear a simple explanation of wrong things - it takes too much to change anything because of widespread adoption of the technology, because of markets, because of many kinds of trash we have created by ourself. Aren't we ? Now look at the "Java way" - it wasn't an overkill technology, but it had (and have) the support. And now it stands here as one of the most important software development environments. And support was there not just because of money. The support was granted by attracted people and they have made the Java what it is. Why they aren't doing it again ?

Now the legacy. It is hiding right before you in your computer. The computer, of course, is a good enough thing to use. And we use it extensively. But sometimes things inside of it again start refusing to do something useful. It is time to put an end to such an unpleasant behavior. The question is - how. The answer is - the computer must be simple. Then we quickly can correct any misbehaving thing in it. Now let's imagine - we see the problem, get our preferred text editor, write a few lines of code and the Windows works much better ! Is it possible ? I am afraid that with Windows we have to keep ourself very far from such scenario.

I am a Java programmer and I see the world from the Java perspective. When it takes a few lines of Java code to get something good working better - it's really attractive thing. And there are millions of Java programmers in the world. But most of them never will get working on "the better Windows" thing. Not just because of Microsoft, but because of unfriendly environment. How long it takes to learn not only many things about pointers, stacks and memory allocation, but much more about IDEs, debuggers, libraries and so on. And add to this mess C-like coding style with THE_VRY_NICE_CNST_AT_EVRY_CRNR. Yeh, it looks great ! That's why there should be the Java Operating System. The great community of Java programmers really can make the world better.

But beside of the Java world there are more things to consider. Have you ever looked at all those registers and interrupts of the hardware world ? It is a nice picture. The great legacy slum can not be more interesting. First, there are legacy means to get operating system be written to memory. For it to work you should know about disk sectors, partitions, file systems, about different BIOSes, about DOS time old functions called interrupts, about very old modes of operations of Intel processors, about 30 years old memory layouts, about IO ports that can start rebooting your computer if you have written one extra bit while turning on another almost 40 years old thing called A20 line. Yes, it is very funny thing - legacy hardware. But why it is such a mess ? Because there are markets, there are clients, there are vendors, there are ... Ok, ok, all those things are there, but what is the solution ? And the solution is Java. Very simple - every C programmer every day produces another anchor for all that legacy slum. Because all the code is compiled into machine instructions and those instructions, for example, often work with such nasty things like mentioned above A20 line. And if there will be much better system - all those instructions will refuse to work. Now imagine a number of terrabytes of those instructions coded by millions of C programmers. And all of them will refuse to work when there will be a new processor or a new system. Now the "Java way". Java programmers write machine independent code. And if there would be a system, which requires no C code to perform it's functions, but only machine independent Java bytecode ?

In short: To fight complexity of the existing systems there is a need in some means of decoupling old and dirty hardware solutions from the software world. The best thing for it is Java with it's hardware independence and very deep market penetration. The Java OS system with very simple and clearly defined architecture can be a very good starting point.

The Ideal System

The Ideal System should be simple. I hope it is obvious. And to be simple the Ideal System should solve the great legacy slum problem. Having it solved we can pave a way for the millions of programmers to start making the world better. Another point - the system should have some market advantages. Having such advantages we can attract IT business to support the system. Now in short - simplicity, community, advantages.

The simplicity is very simple ;) Just wrap the alien hardware under the Java Operating System layer. Any new hardware can be integrated with the system just once by the OS development team or driver developers. Most applications, the system can have, depend on the Java Language Specification, and few can depend on the Java Virtual Machine Specification. And there is no problem of millions applications tied to the Intel processor instructions and things like A20 line. But do we replacing one nasty thing with another ? No. Because the Java Operating System opens the way to many Java programmers to implement or improve applications at much deeper level than they have today. And not only applications, but the friendly environment. Can a Java programmer do something serious on the Windows platform ? On the Java OS platform Java programmer can perform much better.

Next, there is the architecture. The Java world has delivered a lot of architectural innovations and gave birth to the range of technologies from mature to very promising. And all this potential can work in the field of the OS development !

Another point here is the hardware. When it takes only a few human-years to get to the market absolutely new computer system with interesting capabilities and a huge code base of existing applications - is it a small advantage ? Today only the entities like Microsoft or Intel or Google can introduce into the world something new. And what a waterfall of inventions we can have with such small investments required as just a few human-years for a completely new system ! To get some illustration - imagine how many human-years is required to port Windows and most of its applications to the ARM or the PowerPC processor. Now is the question - does it worth to replace an exclusively Intel driven system with a multitude of very interesting systems ?

A bit more of advantages. There is a speed issue, which is worth to consider. What is the limit to the Intel based system performance ? It is a simultaneous execution problem. A program should be written in the form, which can be easily split among multiple processor cores. But most programs are not written in such a form. The Intel understands this and works on the superscalarity technology, which attempts to analyze incoming instructions and execute them simultaneously. Some performance gains, of course, are achieved, but the limit here is a processor technology. It just has not enough intellectual power. A powerful analyzer requires a lot of memory and utilizes any processor for 100% for relatively long time. Does processor has power to throw away it's job and start to analyze instructions only ? And has it dedicated memory for such a task ? Also processor lacks structural information, it doesn't know that the loop it executes is just a mean of introducing delay until some hardware accomplishes it's duty. And there are many similar things. Then - what to do ? Now the solution - the Java OS. The Java OS has a lot of memory for any analysis, it has the time to perform this task, it already has the structural information as in the form of it's own well known actions and also in the form of the program structure which is split into high level methods in case of Java. With all such means the Java based OS should have a great performance advantage over existing hardware solutions.

Even greater performance gain can be achieved in case of development of a specialized hardware for use with heavily optimized machine code, produced by the Java OS. But this subject will be covered a bit later.

In short: From the simplicity point of view the single specification of the Java Virtual Machine is much better than instruction set specifications of all existing processors. It is very promising to attract the Java community to the Java Environment development at the lowest level possible. And it is just obvious that a new processor introduction will be quite affordable for many enterprises if there was the Java OS. Also, looking at the issue of speed we can much better understand the need for a new hardware. Now the ideal - simple (legacy slum is minimized), widely supported (especially by the Java community) and new hardware friendly operating system.

Steps to be taken

Most promising step here is a unification of the few Java OS development teams under common goal of simple and flexible operating system. And do not worry about competition, after the big business will see the results it will immediately spend a lot of money on the issue. But until it is accomplished there are some problems with the Java OS development. First - there is no goal of making the best OS. For example the jNode OS states that it's goal is just to do something in Java. Is it a promising goal ? Another problem - experimental Java OS from the Sun (and a few other systems) has a very important part not implemented in Java. Does it simplify a new Java OS developer attraction ? And finally the old and so understandable by almost all programmers problem - there is no good documentation for all those operating systems.

To address those deficiencies we need the good goal, should implement low level of the OS part in Java and write a comprehensive documentation. The jEmbryoS is an attempt in such direction. But it is not very easy to create the Ideal System, at least in terms of human-years required. That's why the jEmbryoS is still in it's infancy. As of the current moment (beginning of the year 2014) the jEmbryoS exists in the form of a very simple technology demonstration system. It has some useful components developed, but lacks a number of things without which it just can't be called an Operating System. And as such a "not real operating system" the set of developed technologies can be called an Embryo System. Like a real embryo the system has almost all required parts, but the parts are small and not always ready to live in the real world.

In short: The Java OS development teams should be united. A starting point for a such process can be the jEmbryoS, which can be extended or used as a technology incubator and spare part source for The Best Java Operating System.

Optimizing compiler

The most important part of a really best Java operating system is an optimizing compiler. It is a key to the system performance. Current implementation of jEmbryoS just directly transforms java bytecode into machine instructions without any optimization at all. The bytecode, generated by the Java compiler, is far from optimal due to it's universality. It operates with additional entities like stack and variable storage. All those things are just a simplified abstraction, but not an optimized tool for real world tasks. Existing JVMs, of course, use the bytecode just as a starting point in a long quest for a better optimization. And all those optimization algorithms should be developed for the Java Embryo System for it to become a mature animal in the software world.

Two optimization directions can be highlighted. The first - machine code level. And the second - logical level. At the machine code level the bytecode inefficiency should be eliminated. It is very frequent situation when directly translated code makes absolutely stupid things. For example, the JVM specification requires that all parameters of a method have been placed on the stack and instructs the Java compiler to produce bytecode for reading a variable from the variable storage, then to push the value on the stack and then to pop the value and feed it to the actual method call. Here the push and pop steps are in fact absolutely needless, the system can just read variable from memory into processor's register and leave it there while performing method call. In optimized case there should be just one machine instruction instead of may be 20 instructions with the direct bytecode translation.

The logical level of optimization is much greater in it's scope and number of algorithms used. The source code level is expected to be the most important optimization place, but all the work behind the scene also can give a very good speed increase. The source code level optimization includes such methods as method inlining, loop optimization, data flow analysis and many others. After high level optimization of the incoming bytecode there goes optimization of the bytecode actions. Most important part of such optimization is the reduction of memory allocations. Every allocation requires not only the memory management code to update it's data structures, but there is also the garbage collection part of memory manager work. Reducing or optimizing memory allocation we can achieve very good speed increase. Another internal responsibility of the system is the management of locally used variables. The JVM specification requires the Java compiler to produce bytecodes, operating with two kinds of variable storage - the bytecode stack and index based storage of variables. It is not optimal to use those storages directly, but for sake of simplicity the current implementation has both facilities implemented in a straightforward manner. Because of automatic memory management the bytecode stack and variable storage are supplemented with additional entities to distinguish objects, who require memory manger attention, from primitive types which occupy only the space of bytecode stack or the variable storage facility. All the memory structures for variable management should be optimized to reduce required operation overhead in the first place and to get lesser memory layout as a secondary goal. Beside of the internal state management there is one Java specific area of optimizations - a multitude of checks to ensure stable and robust system operations. All those checks for null pointers, for array index out of bounds and so on can be executed only once when the variable does not change value, but the JVM specification requires to perform the check every time the variable is accessed - it is another way to get the system to the tops of the industry performance list.

In short: Efforts required to create a good optimizing compiler. The directions of it's work can be split among machine code optimization, language level optimization and supportive 'behind the scene' procedures optimization.

Architecture improvements

Architectural part of the system design is a never ending process of improvement assessment and implementation, for which there always is a need. At the it's current level the system represents a compromise between simplicity and speed of development. The simplicity is a very important thing when there is a hope to attract other developers to the system design and improvement. But often it requires some additional components to be developed in order to implement the simple solution. While being a hobby player in the field of the OS development I, of course, have constraints from the side of time that can be spent on the better design. As a result there is no such a good thing as may be somebody expect and additional efforts are required to make the system better.

At the moment most demanding areas of the architectural work are the hardware related components and context switching capabilities such as scheduler and interrupts. The hardware for the first hand requires a lot of drivers. The second problem with it follows the first - an expertise in the area of particular devices is required. And third - overall hardware management can be improved by rethinking it's existing state. The second demanding direction is a context switch side of the system. The problem here is the efficiency of the interrupt processing and new scheduler algorithms with better capabilities.

Another important architectural area is a memory management. It includes garbage collection (memory reclaimation) and memory allocation. There are many improvements waiting for implementation in the area of memory allocation. Now it not always performs efficiently and is burdened with critical section access synchronization by competing allocator and garbage collector. But beside of the current implementation there are many algorithms worth to be considered as a means of the memory management - it is another area to work on.

Testing and debugging are also very important parts of the OS development. On this side we have a need in a pure Java hardware emulator. Without it the testing and debugging require much more efforts. It is possible to use existing C based emulators (and they are actually used), but the simplicity of their integration with the Java OS development environment is compromised a lot. For example the existing versions of Qemu emulator just do not allow a usual debugging from within a Java IDE because of the very big communication overhead. There are existing Java solutions for hardware emulation - the JPC and the Dioscuri emulators. But both of them lack documentation and are not ready to be used out of the box for the Java OS development.

In short: The architectural part of the system is not ideal. Most promising directions for architectural innovations are - overall system simplification, hardware management, memory management, context switch algorithms, pure Java hardware emulator development.

Drivers

For the jEmbryoS to be interesting from the business perspective there is a need in a set of drivers to support network communication and hardware access. Without network support the system will never gain any momentum. This means a demand in a network card drivers, at least for some most widely used cards. Networking also requires TCP/IP stack implementation. There are such attempts in Java, but their suitability should be evaluated. Another demanding need is a persistent storage access capability. Most obvious hardware for a such storage is a hard disk. And this means the demand in a hard disk driver development. The HDD driver is followed by file system access libraries. Such access should be implemented at least for reading from a few most popular file systems. But an actual persistent storage of a future version of the OS can be implemented not only using hard disk drives. There are interesting options of a flash memory support and a database access using high speed network card. And there is most simple and usual for many people way to get a persistence - a USB memory stick. But such an option requires USB driver stack to be implemented. While it is not an easy task, but having such stack implemented we can attract much more people to work with jEmbryoS.

In short: There is a demanding need in network card, hard disk drive and USB drivers.

Auxiliary software

The system management facilities are very important if the system looks for it's users. Such facilities include at least a text based console with extensive set of commands and tools provided. The console also should have some remote access means for a convenient administration. Beside of a general system management there is a need in a single Java task management facilities. In the time of cloud computing and widespread virtualization the system can deliver virtualized environment with many application or web servers available on demand in the form of a natively supported simple Java task. And every such task can be connected to a client account with easily managed resource utilization limits and flexible financial plan. For such virtualization to be ready out of the box there is a need in single task management facilities.

In short: System management and virtualization support components are required for the system to be successful.

Hardware

With all those great optimization capabilities of the Java OS existing hardware is just not able to deliver the best performance we expect. As mentioned above the optimization capabilities, the Java OS has, are much greater than any existing OS can have. The capabilities are superior because of additional information available to the Java OS in the form of program structure data. Another superiority reason is the complete elimination of any applications with direct access to the hardware even in the form of the machine instruction provision. Having all software working with hardware indirectly and exposing it's program structure to the OS we have a very good opportunity to implement a passthrough optimization for all the software running with all required information available. This is just impossible for systems like Windows or Linux.

But what hardware can deliver the expected performance ? To answer this question we need to look at the way the hardware works. Many processors when running a program are trying to execute as much of the program instructions as possible. But what is possible is limited by the internal processor's logic and the information available. However, the processor is able to read a bunch of instructions and determine if there is a sequence of data movements or a set of arithmetic instructions. Having the set of instructions the processor can figure out how to execute them simultaneously while preserving overall result of the computation. But a typical success along this way of execution draws in only a few instructions and almost always less than 10. Now we can compare those best cases with 10 instructions executed simultaneously with the actual demand from the typical application. Let us consider very common functionality of the Java web server. It uses the technology called Servlet API. The essence of it is a generation of a very long string, which can be sent to the client's browser as an html or pdf content, for example. The string is concatenated from many fragments, prepared by programmer at the time of writing the servlet or it's representation in the form of JSP (it is another Java technology). When we understand the process it becomes obvious that there is a lot of work which can be done simultaneously - all those copy operations from the fragment buffer to the output string buffer. And now what is the 10 simultaneous processor's instructions comparing to the millions of instructions required to create just one html page ? Obviously - there is a need for a passthrough optimization from the splitting of the html response creation into simultaneously executed parts and up to the some hardware capable of moving a lot of strings from one place to another.

Is such a hardware possible ? Yes, it is absolutely possible. Contemporary processors has plenty of room to be used more efficiently. There are billions of transistors in powerful processors on the area more than 500 square millimeters. Such a huge quantity is equivalent to the tens of millions of simple elements like adder or subtracter, or it can be used for a very big register file with a total capacity measured in many megabytes. The only problem today - there is no software capable to utilize such a vast quantity of computer power. And now - the Java way. Imagine heavily optimized web server response creation code, which is executed on the hardware with many thousands of simple arithmetic and logic units and internal cache of at least megabyte. All the hardware is required to do in case of web page creation can be described as loading a lot of strings into the internal cache and ordering them in some required manner. The limiting factor here, most probably, will be the speed of loading and storing of all those strings. But if the Java OS can manage the data in the processor's cache directly - there is no need to worry about it being stored. When the web server response is ready inside of the processor's cache all we need is just to move the data directly to the network card from the cache and, simultaneously or even earlier, start loading of a new chunk of information into the processor's cache. The performance should be really spectacular.

Is there a hardware like described above ? Not exactly such, but relatively close to the proposed ideal. There are options of multiprocessor architectures from one side. And there are technologies like VLIW processors from the side of traditional single chip solutions. And remember - all those solutions require relatively small efforts to be supported by the Java OS and as a consequence - can get a way into the world of enterprise server computing without permissions from behemoths like Intel or IBM.

In short: To get the best performance ever the system needs a specialized hardware. The hardware should have many simple computing units and large internal cache with a very high data transfer rate. No internal management (like superscalarity) is expected from the hardware. All hardware capabilities should be exposed to the Java OS and be managed by the software with the help of highly optimizing compiler.

Potential usage

Just look at the Linux and you can grasp a bit of a future prospects. And do not forget to add here all other systems - who in the world doesn't need a simple and quick OS ? But there is a long way ahead of us ...

In it's existing state the main usage area for the jEmbryoS is it's development. But the feature of the easily loadable from a flash stick Java code can be interesting for many Java programmers. It is very simple to get full control over your computer with a few personally yours lines of code, included in the jar file on the flash stick. But any serious goal, of course, requires a lot of coding. However, there are a few commands included for the purposes of just to satisfy your curiosity and show you the list of PCI devices your computer have or processor information which is made available by the processor manufacturer. More information about the commands can be found under the Console usage paragraph.

Business target (executive summary)

The technology presented here is very capable of achieving great business results. It's potential area of domination includes many server applications. The domination is based on the best performance unachievable for the competing technologies. Also it is based on the enterprise world de facto standard Java technology. The Java technology on the server side now is limited to the application servers only and with the systems like jEmbryoS it becomes possible to reduce costs eliminating existing redundancy in the area of the technical support staff. Now the server running company can extensively use the ability of Java programmers to write not only server applications (like web services and etc), but to extend the capability of the applications by working with the layer of operating system. Any routine operating system and applications maintenance is also absolutely manageable by the Java programmers in case of jEmbryoS. It means that the same staff can develop better server applications and also perform any server maintenance by means of Java programs.

Another advantage of the Java OS is its inherent ability to run Java applications. Such an ability gives the virtualization capabilities ready out of the box. From the customer perspective there will be just a Java application server or a Java web server which is running on a hardware of the hosting company. But from the point of view of the hosting company there in fact can be many Java web or application servers running on a single hardware server. And all those web or application servers are completely manageable in terms of resource consumption. It is very easy to control, for example, processor or memory usage having Java OS running a set of Java servers. All this easy control provides means of very flexible service even for most demanding customers of a server hosting capabilities.

In the area of cloud computing the Java OS technology is also very promising. Beside of the standard cloud capabilities the passthrough Java technology solution gives some very attractive opportunities like live (without stopping servicing requests) application migration from one server to another or server software update without interruption of the hosted software operations.

Having all those brilliant perspectives described above it worth noting, that the cost of getting the technology to work is very small comparing to the any competing solution. For example it takes just a few human-years to get production ready system able to run open source servers like Tomcat or jBoss. Once more - the cost of entering the server market in a multitude of it's forms is very low and the financial results can be very high.

Now a bit about hardware. The best performance the system can show only with a dedicated hardware support. Such hardware production costs assessment is out of scope of this text, but the price is not very high. And we should remember that the best for the Java OS means superior for any other OS (see Possible hardware solutions for more information). If the OS will use the same hardware as other systems do the system can benefit from the tight integration of applications and the operating system from one side and from heavy optimization capabilities of Java OS from the other side. The second viable advantage - the Java technology with it's own benefits - will always be available independently of any hardware used. All this means that the Java OS is a promising solution even if based on existing hardware.

In short: The market of server applications of different kinds ranging from cloud computing to virtualization solutions is relatively easy penetrateable for the Java OS technology. The costs of entering the market are low. The potential return on investment is very high.

System architecture

The jEmbryoS represents a 32-bit Java operating system. The goal of the Java OS is simplicity. jEmbryoS was built with this goal in mind. As a general approach to the required architecture the strict following to the JVM specification was chosen. While it is not efficient due to relatively high level of the specification the approach achieves the goal of simplicity. Having such simple solution in place does not prevent developers from achieving a good performance because of relatively small efforts are needed to replace some simplified components in the future versions. But even with the system in it's inefficient state the optimization work on a good performance always has a workable base for testing and architectural solutions evaluation.

Another important architectural gain with the simplified overall architecture is a set of useful components and technologies available. The components are in workable state and has been evaluated and tested in the process of the system development. This allows not only a direct use of the components, but gives a good experience of evaluation component's pros and cons.

In short: The goal of the system architecture was simplicity. A simple system delivers easy entrance path for all interested developers and increases the system popularity.

High level view

At the highest level of abstraction the system is represented by a number of isolated layers. Such layered approach resembles the contradictory goal of a software in general to be reusable and flexible in the same time.

The lower layer isolates a core and most reusable set of components most essential to the system functionality. Because the system layer should be stable even in case of any problems other layers can have, there is another important concern addressed by the isolation approach - the reliability. This is a general concern for all operating systems, but it's resolution differs very significantly among different operating systems. While isolation of the system functionality in a module called kernel is a common solution, the implementation details can compromise such approach. For example - an incorrect driver can easily crash some operating systems because of compromised isolation.

The middle layer represents a domain oriented part of the system. It includes components that require essential rearchitecting only in case of retargeting the system from one domain to another. For example when there is a need in embbedded system then most probably it's user interface will be significantly rearchitected in comparison to the desktop OS. The layer essentially works as a server for client requests and almost do not do anything else. That's why it is called a service layer.

The highest layer represents a fine grained tuning level for the system to accomplish it's goal in most efficient manner. A minimal element of a tuning is an application. Adding or removing those elements allows a system owner to adjust his system along with some particular requirements.

Now about layer interaction. It seems a very simple and convenient approach to use a service abstraction and implement the interaction facilities as an injectable object which provides all services of the underlying layer. The system and service layers has defined interfaces for use by higher layers. At the moment the interfaces provide a small subset of potentially useful functions and represent just a starting point for the future versions of the system. Having no rigidly defined interaction facilities gives anoter bit of flexibility to the yet far from mature system.

In short: The system represent a layered architecture approach. There are three layers defined:

The system layer

When the system starts there is nothing in the memory and a special component called 'bootloader' ensures the system image is copied from the persistent storage into the memory at a predefined address. After the image is placed where it should be the bootloader jumps to the first instruction in the image. From here the system creates it's internal structures and initializes hardware. This is responsibility of the bootstrapping code.

Then, for the Java code to work properly, there should be a set of capabilities defined in the JVM specification. The set includes many things from primitive actions support and up to class management and garbage collection. Simple capabilities are implemented using the combination of data structures and variable management facilities. More complex things are implemented as separate modules like the Class Loader or Garbage Collector. Essential part of the capabilities is built in the system image at the system build time, but not everything is viable to include in the prebuilt form in the system image. In particular many objects are created at the initialization stage. And it is most convenient way to work with Java objects - just let the JVM create them.

But objects can not exist without memory. And the system has corresponding capabilities for object state keeping and updating. It is the memory management facilities. The memory management starts when there is a need to create an object. If we recall, that everything in Java, except primitive types, is an object, then the importance of memory management becomes obvious. There are two parts of memory management - memory allocation and memory reclaimation (deallocation). First part marks some place in memory as occupied. And after second part detects that an object in that place is not more accessible from any other Java object or static field or even from the execution stack, then the second part marks the place as free and available for new object creation. The first part is named 'Dynamic Memory' to distinguish it from static chunks of memory allocated to the current task as a whole. Actually the dynamic memory is optimized for frequent allocations and deallocations while static memory represents just a very simple linked list. And then goes the second part. It is actually the garbage collector, which is responsible for unused region detection and reclaimation.

Beside of the memory there is a need to work with hardware and somehow isolate and manage applications. The work with hardware in it's essence is very simple - we need just read and write to and from some memory locations. But to work with hardware in efficient manner we have to know a moment when the hardware finishes to do something we asked it to accomplish. The signal about such event is traditionally implemented in the form of so called 'interrupts'. Interrupt happens when processor receives notification from hardware and stops executing our main program. After program state is saved the processor jumps to a helper procedure which is intended to react in some way on the accomplished hardware duty. When such a change in program flow occurs we have a process switch happened - from the main program to the helper procedure. Another process switch can happen when we wish to run more than one task simultaneously on a single processor. This is accomplished in the same manner as in case of interrupts - a timer interrupt tells us about the moment when we should switch from one task to another. What task we should switch to is decided by the process management facilities. In particular there is the scheduler component, which remembers run time of every task and calculates when it is fair to give a particular task to start working again.

An finally there is a general task of hardware detection and it's basic management. Only most common hardware is initialized and managed by the system layer. For example almost always there are a system clock hardware, a system timer, PCI bus, an interrupt controller, a memory, a processor - this is the currently supported list of the hardware to care about in the system layer.

The system layer is located in the jEmbryoS project. The location of the system layer service is as follows:

The service implements publicly accessible interface from: In short: The list of most essential system responsibilities addressed by the system layer looks like this:

The service layer

The goal of the service layer is to define a suitable environment for the application layer within a target domain. As such an entity the implemented layer has it's responsibilities in the following areas: The user interface is represented by the Console task. The Console listens key presses and translates them into human visible actions. It's design is based on the well known model-view-controller pattern with hierarchy of views and corresponding controllers often incorporating very small model (a state of the view). This pattern is very popular within the Java community and doesn't require additional explanations. Also console is responsible for the initialization of the service layer. Such responsibility is a consequence of a primitivity of the service layer in it's current state. During initialization beside of console's internal components it creates a minimal set of drivers currently in use.

The drivers, as a means of hiding hardware interaction complexity, expose their interface to the service layer. The interface defines simplified model of the underlying hardware. Implementation of the service layer ensures any access requirements defined for every driver interface (like security, for example). And this is not the only reason the drivers are placed within the service layer. Because of their inherent variability the drivers can be managed more efficiently right under the service layer umbrella. For example if there is a security requirement to ask a user to confirm a new driver installation then it would be very inconvenient to do it from the system layer.

The security is a useful service only for some kinds of operating systems. And as such not always required thing it is placed within the service layer. At the moment the security is disabled in jEmbryoS, but there are all essential parts defined to enable it when the need arrives.

The service layer is located in the jEmbryoService project. The location of the service layer service is as follows:

In short: The service layer is responsible for a suitability of the system for particular purposes. At the moment the layer is very simple and should be extended to give the system an ability to work within production environment.

The application layer

Applications in jEmbryoS are just a plain Java classes with a starter method defined. The starter method has the well known name 'main' and takes one argument - a String array of method's parameters. Also the starter method can define a field of the service layer interface type and the system will inject a value for this field before the method takes execution control. Everything else is just the old plain Java. The only difference from other JVMs is the injection of the service layer accessor object.

In short: Applications are very simple in jEmbryos, it just a Java class.

100% Java

The simplicity for Java programmers is also an important goal here. It is achieved by the implementation of 100% of the system functionality in Java. There is no line of code that requires anything more than text editor and Java compiler. But, of course, there should be some means to access machine level functionality. The approach chosen for such access resembles a software library where it's users just call predefined set of functions and do not think about the implementation. But when they want to extend the library they can go inside and change it's inner classes. In the such machine level 'library' the java native methods are chosen as a form of interface between a developer and hardware. The hardware here in most cases represents a general processing unit also known as the processor. The processor has a set of instructions and some internal state represented in a form of registers. When an instruction is executed some registers can be updated with new value or bits at some memory location can be changed. To invoke a machine instruction a developer just writes a usual method call, passes required arguments and goes to the next instruction to check the result of the previous, for example. The only problem here is a form of representation of the machine instructions. In the current implementation the assembler-like form was chosen. The assembler language is proven to be useful and convenient for work with machine level commands. Also it is very well known among developers and there are a lot of tutorials in the internet - all this are the arguments for assembler-like form of the internals of the machine level access library. More information about the assembler library can be found here.

Another hardware, which is accessed by the machine level library is a memory. In the current implementation the memory model is very simple - it is an array of bytes, indexed from zero and up to the limit of the available memory. Because the byte is not a very usable primitive type in the Java language, there are also three arrays of types short, int and long. Because of each primitive type requires different number of bytes to be represented there is an issue of the array index. For short array the index represents positions of double byte sequences, for int - positions of quad byte sequences (4 bytes), for long - eight byte sequences. Another issue with the arrays is the limit of an array length defined in JVM specification. The limit is able to address just two gigabytes minus one byte. While there are a lot of computers with memory essentially more than 2 gigabytes it is an unpleasant limitation. But luckily even the short array can address up to 4 gigabytes of memory which is enough for many contemporary computers. Next the int array can address up to 8 gigabyte, long array - up to 16 gigabyte. But what to do with machines that has more than 16 Gb of memory installed ?

And there comes the addressing issue. If a usual primitive type of int is chosen for memory addressing then we have another JVM specification related limitation - the int type must have only 4 bytes as it's internal representation and one bit of those allowed 32 is reserved for the arithmetic sign. It means we can address only 2 Gb of memory using positive int. In the current implementation the problem is solved by the selected target processor environment. Most widely used Intel processors have somewhere near 10 modes implemented. Among those possible modes one most widely used is chosen, it is so called 'protected mode'. In this mode processor can address only 4 Gb of memory and uses 4 byte long address word. Luckily the int primitive type also has 4 bytes. But there is a sign bit, how to work with it ? The current solution is simple - if you are going to work with the memory directly or even wish to touch processor's registers then you should to know about bits and bytes everything, this means that if you have a negative int, then you already know that the bit 31 is used to represent arithmetic sign, that it belongs to the higher byte of the 4 byte word and that it means nothing for the processor when it loads the bytes into register. This means that int type just should be passed to the native level library without any change and the processor always will treat it as an unsigned integer. Another point of misunderstanding with the int primitive type is it's representation by default. By default the printing libraries always show you the signed form of an int. Here one rule is always very useful - if you work with the hardware directly the best form the numbers can be printed in is a hexidecimal representation. If you use such form - there is no misunderstanding. But for operations with addresses in Java we still should remember the nature of Java ints. To circumvent this obligation it is recommended to use very simple address arithmetic and comparison functions. In fact only the comparison functions are useful and arithmetic is as simple as just using the int in expressions and turning a blind eye to it's sign (look at hexadecimal representation).

But future versions of the Java OS should use different processor mode capable of addressing much more than 16 Gb. For such mode to be adopdted the primitive type to be used as an address should be changed to long. This will eliminate all problems with the int type address except one - it still will be impossible to use arrays of length more than 2 billion (roughly). And the solution for the array index problem should be array replacement by a memory access function. The function can be defined with any imaginable set of arguments including long, obviously.

In short: To achieve 100% Java mark there is a need in a mean of access to the machine level. The assembler-like representation has been chosen for the hardware interaction level. But to stay as close to the Java as possible another mean of access is provided - the memory arrays. Every other hardware functionality is accessed using Java methods, marked as native for the low level access to be highlighted.

Data structures

Everything in jEmbryoS is designed as a Java object and as a consequence there is the only need in data structure when we consider the object internals. The object as an instance of a Java class has two data structures defined - the instance of a class structure (the object) and the class instance structure. Here we have a bit of pun with the words class and instance, but it is the official Java names. In other words the structures can be defined as a type structure and a type instance structure. First is a representation of static fields of a Java class and second is representation of fields of a class instance (an object). And there is the third structure - the structure of an array. No more essential structures required to understand the jEmbryoS internals.

It should be mentioned, that the phrase 'class structure' does not refer to the instance of the class Class - it is another point of confusion we have to avoid. There are classes in Java and there are their representations, instances of the class Class, and a bit more - there are instances of classes, called 'objects'. Those are three different entities and should not be confused.

Every object in jEmbryoS has a header field which is 4 bytes long. Recall, that object is a class instance or an array, but the class structure differs from the object structure. The header is used to distinguish arrays from class instances, to determine array type, to perform memory management and to be able to use any object as a monitor when Java code enters synchronized section.

For the structure access to be simplified there are a number of classes defined as an access helpers. The helpers isolate the issue of a structure and allow relatively simple structure change. All access to the internal structure in jEmbryoS is performed using helper classes. Here is the location information for the helper classes:

Header masks in *MemoryMap classes define the bit assignment of the header bytes. Structure offset information in those classes defines memory layout of bytes for objects and classes.

Usually there is no need to use structure information directly and all the mess is resolved by the accessor classes. For example the class InstanceMemory can be used to get required header information and to access object's fields directly in memory instead of using standard Java way with object reference and class member access construction (the point).

Another important point about the data structures is the information about structure type. Every class instance has a reference to the ClassData.class instance and as such - can be introspected easily. Every array has information about it's data type. Primitive arrays has this information in the header and arrays of reference data type has the same kind of information as every class instance - the reference to the ClassData instance of the array data type. The ClassData class represents complete information about a Java class. The JVM can not have more information about class than ClassData instances has.

The class instance structure has a feature of having fields from multiple classes - ancestors of the current class. The fields from every class in hierarchy are grouped in separate sections and aligned on 4 bytes boundaries.

The class structure beside of static fields has information about all implemented interfaces (including ancestors) followed by addresses of all methods, including ancestors. First there go addresses of Object class methods, then methods of it's successor and so on until the current class. Method addresses are ordered in the same sequence as in the source code.

Here is the object internal structure (class instance structure):

Here is the array internal structure: Here is the class structure:

In short: The jEmbryoS has just three data structures - class instance, class structure and array. There is no need to access the structures directly using means of memory access because of special structure accessor classes implemented in the project jEmbryoCore.

Method call implementation

The subject of a call implementation is very important because it establishes a basement for the JVM architecture. It combines together such issues as memory management, performance optimization, compilation, process management and some others. Because of it's cornerstone role it is not optimal to try to invent a best ever method call implementation until all listed issues are deeply understood and there is a clear picture of their interdependencies. As a system without optimizing compiler the current jEmbryoS implementation is kept away from thoughts about the best method call and has been developed with the simplicity in mind. While it supposes some very noticeable set of rearchitecting efforts from one side, but from the other there could be much more things to remake if initial 'the best' way will prove to be a 'not so good' solution due to a multitude of unaccounted issues. That's why the method call architecture was defined in existing terms and does not pretend to have a title of something 'best ever'. And such approach was very helpful in development time reduction, a big plus for me :)

Now - the call. First, there are stacks. The JVM specification operates by two variable storages - the bytecode stack and variable storage. Every time a method is invoked there should be both storages available. With a strightforward implementation, obviously, there should be two entities - the bytecode stack and the variable stack. First keeps data for any stack operating bytecodes (JVM instructions) and second creates new item for every method invocation and then deletes it after return from the method. But for the mentioned stacks we need a pointers to their tops. The pointers are implemented as a register values continuously present in two dedicated processor registers. The registers for them are ESI and EDI for bytecode and variable stacks respectively. Together with EBP and ESP registers they define an execution context. Because there are just 8 general purpose registers under protected mode of the Intel processors, it is impossible to keep an entire context (all of it's data) in the processor registers. That's why we have EBP register dedicated for the context reference keeping. Another register - the ESP - is used as it is recommended by the Intel, it keeps track of the system stack. Now we have 3 stacks - bytecode, variable and system.

But because of the mentioned earlier basement role it is still insufficient to have only 3 stacks. Two more stacks are introduced to ensure the memory management can work properly. There is a need to distinguish a primitive variable from an object reference, it helps garbage collector to not reclaim object's memory even if there are no references to it from other objects. When an object is referenced from a local variable only - we have to know that the variable points to an object or it is a primitive type variable. If we have no such a knowledge - garbage collector will reclaim object's memory and the system will crash after an attempt to execute objects method, which address is found using information in the now invalid object data structure. This is the reason to have two stacks with the purpose of keeping track of variable type - is it primitive or references an object. Each of the two stacks is attached to the bytecode and variable stacks and keeps one or zero for each variable slot in the master stacks. One means object reference and zero - primitive type. Slave stack pointers are kept in context's corresponding fields. And slave stack's values are always mirrored state of the master stacks.

Now about stack organization. Bytecode, variable and system stacks are a sequences of 4 bytes long items. Slave stacks of bytecode and variable masters organized as a sequence of bytes. It is bytes, not bits, for simplicity and performance reasons. Bit access is slower because of bit extraction overhead. All stacks have corresponding array references within the context data area. Because the context is an object the mentioned area is, in fact, just an object data structure. And stack arrays there also are just ordinary Java objects. Because an array in jEmbryoS always has a pointer to it's data it is possible to organize stack's data area independently of it's array location. Another point in having array representation of a stack is array's mandatory length field. Having this information in one place it is very simple to implement stack overflow checking. The overflow, obviously, happens when the stack's array limit is exceeded. It is not the best variant of overflow oriented stack organization because of different size requirements for every kind of a stack. It means that when one kind of stack still has a lot of free space, another stack can already have no more memory to use. In other words - stack overflow depth varies depending on many factors. But it is not an urgent issue, I hope :)

Another important thing with JVM is it's debugger friendliness. To have a technology without means to debug it - that's too bad. And jEmbryoS has another stack - the trace stack. The current method class, current method index within class's method array and current line in source code - all this information is kept in the trace stack. Also, for not to increase a number of stacks, the trace stack keeps track of the system stack pointer at the beginning of method call to be able to restore it in case of exception. And second - it also keeps address of the currently active exception handler address (or null if there is no exception handler at this point). The trace stack is organized exactly in the same manner as it's neighbors - as an array of 4 byte long items (primitive type of int).

And that's all, there are no more stacks for today :)

With the stacks described we can trace JVM's actions upon a method call. First, method arguments are copied right after the current variable stack pointer, then all stack pointers are preserved using the system stack, then the invoked method increases stack pointers according to it's needs and because it's needs include the mentioned method arguments then the previously copied values are turned out to be right at the correct position within the variable stack. The bytecode stack pointer is not increased because of it's temporary on demand storage nature. The called method also fills in trace stack item with the address of the ClassData object, method index and zero line in the source code. Also the method saves current system stack pointer in a corresponding slot of the trace stack. And there is another point to consider - there are overflow checks performed when method parameters are filled in and when the called method increases the stack pointers.

During method execution at some points the compiler inserts trace stack update instructions to keep track of the current line in a source code and to know an address of the currently active exception handler.

When method returns the sequence of actions is as follows - the EAX register contains return value (if any), if return value takes 8 bytes (long or double) then it is placed into MM0 register, then stack pointers in their state at the method start moment are restored effectively discarding everything useless in stacks, then return value (if any) is pushed on the bytecode stack and outer method continues it's execution. If there was exception thrown from the called method then (if there is no exception handler) throw handler restores system stack pointer, sets the EBX register value to point to the exception object and returns (executes ret instruction). After return all stack pointers are restored as it is the case with the 'no exception' retrun. Outer method checks for exception address in EBX and if it is not null it looks in it's slot of the trace stack for the exception handler address and jumps there (if found), if there is no exception handler - throw handler is called again and the process continues recursively.

In short: The method execution depends on temporary data storages - the stacks. There are 6 stacks - bytecode, variable, system, trace and 2 slave stacks for bytecode and variable masters. From the Java the stacks are visible as ordinary arrays. When method is invoked the stack pointers are updated and the method operates in it's own stack frame. When method returns the return value is placed into EAX register or MM0 register if the value has length of 8 bytes. If there was an exception - the EBX register contains an address of the exception object.

Memory management

There are two parts of memory management - memory allocation and memory reclaimation. Memory is allocated twice - first it is allocated in big chunks for the new task, second - it is allocated by the task for needs of the currently executing methods. The task also allocates memory for new classes it loads and for new methods it compiles. The task can ask JVM to extend it's memory and, if it is accepted by the JVM, adds acquired new chunks to it's already existing limits. When task finishes all it's memory is reclaimed in the form of the same big chunks. Internal allocation within the task is reclaimable only when allocation has been done for new objects. Method and class structure space is not reclaimable. For the internal memory allocation the task has it's internal objects, responsible for such an action. For memory reclaimation the task has it's personal garbage collection object. If the task has more than one thread then all threads use only task's memory. The memory allocated for threads is reclaimed by the task's garbage collector. This means that task is an independent and isolated from JVM and other tasks module, but threads are completely dependent on the task's memory management.

In short: Memory is allocated by the currently executing task using it's own internal memory allocators. Memory is deallocated by the task's internal garbage collector. But task's memory itself is allocated by the JVM at the task creation stage and can be extended after task's request. After task finishes it's memory is deallocated by the JVM.

Memory allocation

Hierarchy of memory allocation looks like following The separation of memory allocation types by size is an attempt to minimize memory overhead of the structures, responsible for the memory allocation. Allocations of size more than 512 kb are seldom and can be effectively managed as a simple list of occupied regions. Allocations of less than 256 bytes are very frequent and require a special attention to the memory and speed overhead issues. Allocations in between are currently managed using doubly linked list with size range additional information.

The location of memory allocation related classes is as follows:

Package org.jembryos.memory contains the MemoryManager class. It is JVM's main memory allocator when it allocates space for tasks. The MemoryManager keeps track of allocated and free memory regions using linked lists. Package org.jembryos.memory.maps contains task's internal memory allocators. Next the description of different allocators is given.

The linked list based memory allocation is very simple - it uses successors of the class MemoryRegion for keeping track of allocated region address and size. When region is deallocated then it is just removed from the linked list of allocated regions and moved in the list of free regions. Such movements can be accompanied by region merging when adjacent regions detected. For the sake of speed it is viable to split the range of allocatable by the manager sizes of regions into smaller parts to increase speed of search for a region of matching size. Also due to speed issues there is an ordering constraint imposed onto some liked lists.

There are a number of helper allocators suitable for some situations. First and simplest is the MemoryBuffer class which just appends the size of a region to the internal offset and does nothing more. Obviously, such regions are not reclaimable and can be deallocated only as a whole. The second simple allocator is the AddressArrayMemoryMap. It remembers addresses of all allocated regions and stores them in ordered fashion.

Next come most used memory allocators. All of them organize clusters of elements of the same size. The address alignment issue prevents us from allocation of object memory regions at an arbitrary position in memory. The Intel processors work at least twice slower with the unaligned memory and some others just refuse to work at all. But it gives us an idea of clusterizing memory regions. Now all regions allocated for small objects (less than 256 bytes) are organized in such clusters. The clusters go from the minimal object's size of 8 bytes up stepping 4 bytes at a time. In total there are 62 clusters. Such clusters allow us to use bit masks to mark occupied regions. This is relatively efficient considering the issue of the memory overhead. Also such organization effectively manages the memory fragmentation. But the speed issue is an area where additional research should be done. Currently there are three implementations of the region clustering allocators - int based, long based and int array based. Tests under standard JVM show a small advantage of the array based variant. But if we recall, that there is an open question of the optimization technology to be used then it seems too early to select the memory allocation technology now. That's why jEmbryoS now uses long based clusters and there are no plans to work on a particular memory allocation technology until overall optimization approach is selected.

Using different memory allocation facilities in the same time requires an integration module to simplify the system development. Such a module is represented to the memory allocation clients in the form of DynamicMemory interface. It's implementation is placed in maps subpackage and called DynamicMemoryBitMapCluster. It unifies three different memory allocators - simple list based, linked list based and cluster based.

In short: There are different memory allocators used by the jEmbryoS. Each allocator is optimized for some particular usage area. Usage areas can be classified by the size of allocated regions. Most demanding size range is found below 256 bytes.

Garbage collection

The garbage collector is a memory reclaimation facility for the systems with automatic memory management. It uses simple algorithm to find a memory region to be reclaimed. The algorithm traverses the object graph and marks traversed objects as reachable. Then all unreachable object memory regions are reclaimed. Currently implemented garbage collectors principally work the same way with some additions to manage a bit more complex issues.

There are three types of garbage collectors implemented - 'stop the world', incremental and concurrent. The 'stop the world' garbage collector can work only if the current task stops it's execution and waits until the garbage collector finishes. The incremental garbage collector also stops the task, but for a significantly smaller time. The incremental variant is generally more manageable and as such is suitable for the systems with some constraints applied to their reaction time. But the incremental collector currently does not works with threads. The concurrent garbage collector do not stops the serviced task and works as an independent simultaneous task. Potentially it is most interesting solution for the reaction time issue, but there is a problem - when the collector works with the task's memory it is required to synchronize it's actions with memory allocation requests from the task or it's threads. Such synchronization supposes an essential speed overhead and requires to have a minimal unit of work with the size equal to the whole synchronized section. This means there is no ideal solution. Beside from the reaction time issue there is a problem of the memory allocation requested too often. Incremental or concurrent collectors just unable to traverse all the object graph in a very small time and if there is an excessive demand for the new memory allocations then it is possible that at some point the speed of allocation will outrun the speed of deallocation and in the end the system will fall back to the mercy of the 'stop the world' garbage collector.

However, the garbage collectors are here and they really do their job. Now let's describe the details. The location of the garbage collector related classes is as follows:

The GarbageCollector class has three methods to run the collection in different modes. The names of the methods are speaking for themself - runStopTheWorld, runIncremental and runConcurrent. All of them follow a single scenario - first they traverse all occupied memory regions and clear the header bit, denoting object reachability, then the object graph trace is performed, it includes not only objects but also the class structures and stacks of all task's threads (in case of the concurrent GC). Next the collector traverses the allocated regions again and looks at the header's reachability bit. If the bit is zero then the region is reclaimed. In addition to this scheme incremental and concurrent collectors use help from so called 'intercepter' which emits additional code for every bytecode stack push. The code checks if garbage collection is performed and what kind of additional work it wants the emitted code should do. There are just two kinds of the additional work - to mark the pushed object as reachable and, the second kind, in addition to push the object on the garbage collector's stack. First action prevents object leakage from yet to be traced part of the object graph to the already traced (when object is moved it is always pushed on the bytecode stack where the emitted code intercepts it and marks as reachable). Second action prevents yet not traced object referrer to escape the trace and hide from it the references the referrer has. Also there is a new object initializing code which marks all the new objects as reachable when the garbage collection procedure is running. The described measures allow to find all not used objects and keep all used even if the collector's work is interrupted and memory allocations occur during periods of such interruption.

To simplify garbage collector's work with different memory allocation classes there is the interface called MemoryAllocator. It is placed in package org.jembryos.memory and defines methods for the collector to traverse all occupied memory regions independently of the underlying type of the memory allocator. The MemoryAllocator works in conjunction with the interface RegionVisitor. The collector implements the RegionVisitor interface and can trace any allocator implementing the MemoryAllocator interface. Also the RegionVisitor interface is used for such goals as the calculation of free or occupied memory.

Beside of the general way of memory reclaimation there is a manual solution for small systems such as bootloader or embedded OS. In such circumstances the use of entire set of the memory management classes is not optimal and the simple solution here is a manual memory deallocation. It is implemented using annotation FreeAfterSet, which is defined in the project jEmbryoJavaAPI for it to be accessible in any other project. The package of the annotation is org.jembryos.util. The annotation is intended to mark an object's field and the compiler is prepared to emit the intercepter code every time the value of a field is changed. When the intercepter code runs it just calls a function which frees the memory, previously occupied by the now overwritten value of a field. This means that the manual memory mangement should look like, first, the object instantiation and remembering it using some field, then second, just changing the field value for another object or null will free the memory occupied by the object created at the first step. But it should not be missed - freeing the object's memory does not frees the memory of it's children, this memory should be freed also by assigning new values to the child fields.

The intercepter emitter class is placed in the package org.jembryos.memory.gc in the project jEmbryoS.

In short: There are three types of garbage collectors in jEmbryoS. The collectors are targeted for different environments and no ideal solution is proposed. All collectors work the same way with some minor differences. The general algorithm is - traverse the memory allocator's occupied regions and mark them as unreachable, trace the object graph and mark all traced objects as reachable, traverse occupied regions again and reclaim memory of the unreachable objects.

Hardware management

The hardware management in general is performed at two point of the system life - when the system starts and when the system has been initialized and gives control to the user interface. The first part applies to a small and most essential subset of the hardware and the second touches all other devices. Currently the first phase initializes the interrupt controller and the programmable interval timer. There should be more serious work to be done at the first stage, it includes, in the first place, an exhaustive hardware detection, but remembering the limitations of a hobby developer time it was considered acceptable not to implement all the possible hardware stuff. The second phase intent is to ensure the user selected drivers to start working on the detected hardware. Currently there are only the keyboard and text screen drivers implemented. It is enough to have an interactive system, but, of course, it is absolutely insufficient number of drivers for a good system.

Another way to classify the hardware management is by the access method. First, the hardware can be accessed just by reading and writing to and from some address in the memory. But not all hardware supports this kind of interaction. The second way is to use another address space, called 'IO space' (input-output space). IO space is like a memory with it's addresses, but with a different terminology. In IO space the address is called 'a port number'. The port can be read or written, exactly like the memory. From the Java point of view the array access analogy is not very convenient for the port access implementation (while can be used) and, from the other side, the memory representation as a byte array also is not suitable due to array size limitation. This means that if both accesses are implemented as a Java function - it is convenient and has no limitations. Also for those who already has the experience of using the IO space the function as a port accessor is a more habitual solution.

In short: The hardware is managed by the system and by the drivers. The system manages just an essential part of the hardware and drivers manage everything other. For the hardware management there is a standard memory access and port access facilities.

Interrupts

The way a hardware can inform a program about an event can be of two types - the program can poll the hardware and determine the time it is suitable to work on the event, or the hardware can tell the program that event has happened at any time the hardware thinks suitable for it. The second way is more efficient but requires some additional work from a programmer. jEmbryoS, as many other systems, uses second approach.

Currently, when the system starts it initializes interrupt controller in such a fashion that there can be only two sorts of interrupts delivered - the timer interrupt and the keyboard interrupt. When interrupt is fired the processor stops the currently running task and jumps to the interrupt handler procedure. The handler procedures are implemented as a native method and their addresses are given to the processor in the form of the interrupt descriptor table. The native form of the handler is a consequence of it's much greater efficiency in such simple cases, but when there is a need for relatively complex processing then a usual Java method is called from the machine code.

The timer interrupt is used for the system to have multitasking capability. When timer fires the native code saves the currently running context and restores the JVM's one, it then calls context scheduler's method onTimer which now works under JVM context. JVM context is chosen to isolate any possible problems with other contexts from disturbing such essential function as task management. Then the scheduler selects next task to be run and returns the task's address. The interrupt handler saves the JVM context and restores the context whose address it has got from the scheduler. Then the handler finishes the interrupt executing iret instruction.

An additional function of the timer interrupt is to provide a mean for the time interval measuring. Every time the scheduler is invoked it increments it's tick counter. The counter is available for other classes and can be used if remembered at the measured interval start and then subtraction of the remembered value from the value of the counter at the measured interval end gives us the interval in timer ticks. The current implementation uses approximately 200 microseconds as the timer tick interval.

Another interrupt notifies the system about keyboard events like key down or key up. When the interrupt fires the handler just updates the interrupt table and returns. Then the interrupt monitor task reads the table and if there is a sign of interrupt - runs corresponding driver. This is not an optimal solution and it would be more efficient if the interrupt handler just tell the scheduler that a driver task should be unblocked. But if the driver is still working then there is no way to unblock it. If we try to update a flag which tells the driver to process another interrupt then there is a chance that the driver is finishing it's work and already checked the flag, so we can miss the interrupt. This means there should be a thread to monitor when the driver finishes and after it is the thread should unblock the driver. And this is almost exactly the situation that is implemented in jEmbryoS - a thread is used to monitor interrupts, but without the unblocking part of the work. Here we have a simple solution, but less effective than possible.

It worth to note that there is a pending issue of interrupt controller change. Currently the programmable interrupt controller (PIC) is used, but there is the advanced programmable interrupt controller (APIC) which has extended functionality. After the change it will be possible to simplify a multiprocessor support development, for example. But while there is no such support then the need for APIC is not an urgent issue.

The class involved in interrupt servicing is located as follows:

The class InterruptHandler has native methods of form isr0xYY where YY is an interrupt number. The interrupt number is shifted by 0x20 (32 in decimal form) to leave lower interrupt numbers to the Intel predefined purposes.

In short: There are two interrupts enabled - the timer and the keyboard interrupts. The first is used for task switch and the second - for a key press processing. All other interrupts can be enabled easily if it is required, but there are no drivers to handle the interrupts.

Driver architecture

The driver is an abstraction level raising entity, it hides details of the hardware interaction and exposes just a simplified model instead of the full picture. As such an entity it seems promising to expose the drivers directly for the application usage. But there is a problem with such approach - as a means of access of the very important functionality like screen or keyboard interaction (how good is a system without screen ?) the drivers should be reliable to the best extent ever. Are the drivers so cool ? Often they are not. Another point here is a security. Malicious driver while having access to the system services can do a lot of harm. Also malicious application can use a good driver to make very bad things. Having all this in mind we have options within the following range - from to expose drivers directly and up to hide everything within a shell and allow only a tightly controlled access variants. With the high control option we will be unable to manage some kinds of special hardware just because we do not know what a hardware somebody will create just a day after we have released our system. With the direct exposure option we will have reliability and security problems.

A compromise for the option range can be found after splitting the problem into it's parts. A system can hide things it knows about and expose in a careful manner things it doesn't know. Also there is a ready to use and very simple solution - a system can ask and pass all the responsibility to the user. But to not annoy users too often it is preferred to be prepared for the yet unknown hardware.

In the jEmbryoS the driver defines it's hardware access requirements. The requirements include memory ranges, IO space ports and interrupt numbers. All those resources are transferred to the driver in the form of accessors. Before the resource transfer the system has an opportunity to look at the required resources and make some conclusions about the safety of access to the resources by an untrusted entity. If the safety is confirmed then resource accessors are created and transferred to the driver. When driver uses the resources the accessors give to the system another oppotrunity to look at the actions of a driver. If, for example, a keyboard driver is trying to access keyboard controler's bit, responsible for the computer reset, the system can prevent such misbehavior regardless of the reason the driver has it done (malicious purpose or a bug). This is an 'input' part of the driver management.

But there is also an 'output' part - the driver can misbehave in different manners. It can crash, or it can log user actions to a file, or it can hang in an infinite loop. The driver crash or hang can be isolates using separate thread for every driver, but it can be too expensive if a system has a hundred of drivers, for example. It seems a thread pool can solve the big driver number problem because the drivers sleep a lot and there are not too much drivers that has a work to do in the same moment. Another thread should monitor the driver activity and kill driver thread after some timeout is expired, it prevent drivers from hanging forever. After crash or kill the driver can be restarted and continue it's work (but it can loose some user settings, however). This way is suitable for driver bugs isolation, but for a malicious driver such approach doesn't work. It is possible to monitor some part of suspicious driver actions like hard disk or network access. If the driver runs under particular context of a well know thread then it is just a matter of time to develop very simple intercepters to monitor suspicious access from a particular thread. But generally there can be many exploits the system doesn't know about. This means that the driver installation should always be confirmed by user for he/she to serve as an additional filter in the way of a malicious software. When the user is asked the information about driver's purpose and required resources can provide some help in determining if there is a real security threat, so the system should ask in an exhaustively informative manner.

Now the application side of the driver management. From the point of view of an application developer it is preferable to have as little drivers as possible due to the simplicity reasons. And the system offers a service solution for such a need. The service layer provides a number of services to the applications including the services with drivers wrapped inside. It not only minimizes exposed complexity, but also allows to the system to control a driver behavior more tightly by reducing number of ways a malicious application can use a driver, by controlling driver response time, by checking for exceptions in case the driver has bu gs and so on. But such approach limits access to the predefined set of capabilities only. That's why there should be a possibility to expose user confirmed drivers directly to user confirmed applications. But generally the driver access should be wrapped under the hood of the service paradigm.

And now about the state of driver affairs in the jEmryoS. Generally it resembles the picture described above, but due to a very limited number of implemented drivers and embryo state of the system there are important differences. After a device rises interrupt line the current version of the system just confirms the interrupt is serviced and updates pending interrupt table with a sign of interrupt was happened (just adds one to the table item), then the interrupt handler finishes. The interrupt monitor task is performing an infinite loop and when awaked checks for pending interrupts in the system table. If a pending interrupt is found then it's handler driver is invoked. After all pending interrupts has been serviced the monitoring task goes to sleep. Currently there is no failure recovery capabilities and malicious software detection.

The InterruptMonitor class is located in the project jEmbryoS in the package org.jembryos.jvm.

After a driver has an interrupt processed some applications may want to know about it. The way it is currently implemented is as follows - the application asks service layer to add an event listener provided by the application, service layer selects appropriate driver wrapper and asks it to add the listener, the wrapper creates a helper to convert driver's data to the provided listener model and registers the helper as a hardware listener with the system layer, the system layer ensures the current task has a separate thread to process events and adds received helper to the threads's listener list, next the thread monitors the drivers it has registered and when a driver has a pending event the thread calls all registered listeners. In fact there is currently no listener addition methods exposed to the applications by the service layer because all work with drivers is completely done within the service layer and in particular by it's console part, but it is not a problem to create appropriate method when needed.

It worth noting that such approach represents a simplification of the interrupt processing. The thread, responsible for the event notification, uses a loop to monitor drivers instead of more efficient solutions. For example a driver can block the thread until there is an event and when the even is available - unblock the thread. But it requires some additional work that is not done yet.

The definition of driver framework classes is located as follows:

The location of drivers is as follows:

In short: The driver architecture in jEmbryoS defines a number of driver wrappers to intercept it's activity. This is resource access intercepters and application interaction intercepters. But in case of a driver of yet unknown type it is possible to circumvent the application interaction intercepter. For interrupt listening drivers there is a framework defined to allow an application to listen to the interrupt driven events.

Task management

The task management is mostly relayed on the context scheduler. When an application is running it has a number of important objects defined and without those objects the application just can not run. The examples of such objects are the stacks, the memory, the garbage collector and so on. All those objects represent an execution context. If the system is supposed to do some things simultaneously then there should be a context switch when one application is replaced by another. The jEmbryoS way of context switch is called a preemptive multitasking. It means the application is enforced to stop at an arbitrary point of execution and then another application continues it's work instead of the first application. Such switch can be implemented in many ways but when importance of all applications is the same then there is no point in allowing one application to work much more than another. An obligation to watch for the equal access of applications to the run time is imposed at the context scheduler.

Currently the context scheduler beside of the run time distribution manages application state while it is not running in the following manner - an application can sleep for a particular number of milliseconds, it can be blocked until another application unblocks it, it can just wait for it's time to run. An update of the state of applications is implemented in two forms - when timer interrupt fires and when an application tells the scheduler that it wants to sleep or to yield it's run time.

When timer interrupt fires the onTimer method of the context scheduler is called. Then the scheduler determines if it is time to check if a new task should be run. If it is time the scheduler looks at the queue of commands it received within the interval from the last timer interrupt. A command tell the scheduler that a particular task should be blocked or killed or something else. The scheduler changes it's internal state according to the processed commands and then looks to the list of tasks it manages. If the running task has run for too long or there was a command to change the task's state or if there is a high priority task waiting to be run then the scheduler updates it's state and returns a new selected task to the interrupt handler. The handler restores the task's context and finishes. If there are no tasks waiting to be run then scheduler returns null and the handler halts the processor until next timer interrupt will fire.

Another way of application state change is a task's decision to get to sleep or to yield the run time it uses. When such event occurs the scheduler is asked to return new task to be run. The scheduler repeats the sequence which it executes when when timer interrupt fires, then returns a new task or null if there are no task to run. Next, a special code restores the context of the new task or halts the processor if the new task is null.

The two described approaches allow relatively efficient way to switch contexts. But there is an issue of the timer interrupt handling time when the scheduler's queue of commands is big enough. The interrupt should be handled quickly or there can happen another interrupt and if the scheduler works too long then there can be a timeout that expires for the second interrupt and it just can be missed. How long such a timeout can be I just do not know - there are a lot of devices with very different reaction time required. But for the current implementation the existing approach works very well. If there will be some problems with it then the scheduling part of the timer interrupt handling should be moved to a new separate task.

The scheduler supports task prioritization. It is implemented in the form when if there is at least one task with the priority higher than another then the last task never gets access to the run time. This means that high priority tasks should sleep as often as possible to let the other tasks to run. Sometimes it can be a problem but it can be circumvented relatively easy. And (as usual) such approach to the task prioritization is very simple to implement.

The current implementation uses one millisecond interval between scheduler state updates and 200 microseconds between timer interrupts. It allows to change scheduler state not longer than in 200 microseconds after a scheduler command was issued. Such time seems better than 1 millisecond :) But in fact the interval between timer interrupts is an open issue. There is upper limit - the JVM specification requires the object's wait methods to take time parameter in milliseconds and it is a good idea to make the scheduler accurate enough to allow the task to wait for minimum exactly one millisecond. But practically there is no rigid constraints on that issue unless something requires very short task switch command reaction. Even the current implementation of the driver event listeners with it's double waiting loops becomes just slightly slow when is run on emulator with speed at least a hundred times worse than it is when the system is run on an ordinary computer.

It worth to note that when the scheduler has no new commands and one millisecond has not passed from the last scheduler run then timer interrupt handler just updates tick count and finishes. Also it can be useful to know that when the scheduler command request is issued by a task, the timer interrupt fires in the same time and the time is exactly such that the scheduler should run - then one millisecond accuracy won't be maintained and current timer tick will just finish after tick count update. It is so to preserve scheduler's internal state from corruption when it's simultaneous update happens. Another point of the scheduler inaccuracy happens when JVM's context is running. This is relatively seldom event when some task needs to do something that implicitly creates a new object for JVM's use, then the JVM context is used to ensure the object will be allocated in JVM's memory and will not be reclaimed by the garbage collector as if it was allocated within of the task's memory. In such cases the timer interrupt handler just returns without even tick counter update. But all those inaccuracies seems should not be an issue because of 5 times greater frequency of the timer interrupts comparing to the frequency of the scheduler runs.

Now here is the location of the context scheduler:

And here is the location of the context switch helper for yield and sleep request processing:

In short: The task execution is managed by the context scheduler. The scheduler updates it's state on every timer interrupt and returns a new task to run or null if there are no suitable tasks. Another way to update schedule's state is a sleep or yield request, it requires no timer interrupt.

Security

The security is not required for the current version of the system, but the framework for it's enforcement is here. Generally the software security is ensured if no piece of a software can do an illegal action. There are open issues of what is illegal and how to ensure the illegal can not occur. The jEmbryoS way (as many others too, of course) is to define what is legal and to prevent an application from doing something else. What is legal is determined by the JVM specification and class libraries available to the application. With the help of the Java language it is relatively easy to ensure there is nothing possible to access outside the predefined set of libraries. First - there is no pointers in Java. This means if there is no object reference passed to the application it just can not use the object directly. The service front end approach used in jEmbryoS ensures that only selected object references can be passed to an application. Another point in the service front end is a possibility to intercept any service call and check permissions of the caller.

But having such front end solves only object reference problem. There are more issues to solve. An application can access static fields and methods of any class. It can be dangerous. To prevent an application from using static elements of a class the project independence was implemented in such a way that the system layer classes are not available to other projects at all. While the system is built at the system image build time the builder automatically ensures that there are no references to the classes inside of the system layer from other projects. Also system image builder ensures there are no system classes available to the application class loader. But system layer exposes a bit of dangerous functionality to the service layer. To prevent indirect access to the system layer functionality there is, first, the service layer interface which is the only thing the system injects into an application class. The interface, as was the case with the system layer, allows careful object selection to be implemented. Now second - while the service layer is small it is easy to ensure there is no public or protected or package level static fields available and it works for the current state of the jEmbryoS. But in the future versions there should be a solution similar to the system layer with it's complete unavailability to the applications.

Next security hole is the application bytecode the system should support. In it's current state the bytecode validation is minimal. It can be exploited for security penetration. To close this hole it is required to develop a good bytecode validator.

Now about the Java security architecture. There are a lot of permission check in a standard JVM implementation, but in the jEmbryoS all such code was commented out because of limited subset of supported Java API. Having all permission check disabled is not a good security solution. In the future versions all permission checking should be restored when required subset of the Java API will be added to the system. But until the permission checks are disabled there can be relatively simple solution of disabling Java reflection for private, protected and package level class elements.

In short: The jEmbryoS is a security ready solution, but with some additional work required to move it in the security ensured category. Currently implemented security framework parts are the system and service layer front end interfaces and project isolation scheme. The features to be implemented include service layer project isolation, bytecode validation and introspection limitation.

Java API compliance

The Java virtual machine specification is implemented up to the version 1.6 and jEmbryoS can be referred as JVM specification compliant. But full Java compliance requires support for standard Java API.

There is no full support for Java SE API. Only limited subset of API exists. It is a consequence of the embryo state of the system. For example the class File can not be implemented because there is no file systems supported, there is no HDD drivers and there is no other things that are used by the classes of the file access API. That's why only essential classes are supported in the jEmbryoS. Full list of supported classes can be found in the jEmbryoJavaAPI project under the src folder.

In particular there is no standard threads support. Threads exist in jEmbryoS, but the standard API requires additional work to be done to ensure full thread compliance. However applications can use service level interface to implement threads, but in not very habitual manner. Example code from the jEmbryoApp project can be used as a reference how to create threads in jEmbryoS, the class SyncTest in particular.

In short: The jEmbryoS has the JVM specification implement, but all other standard Java stuff is remaining to be done.

Technologies

A number of technologies was developed for the jEmbryoS project. Some of the technologies can be used outside the jEmbryoS.

Assembler

The need for low level hardware access was a motivating factor for the Java Assembler development. It is decided that the assembler-like implementation of the low level functionality is more suitable for the system development than other approaches like functions or something else.

The Java Assembler is implemented as an independent project named jEmbryoAssembler. There are a number of machine independent classes under root package of the project. All other classes are x86 architecture specific. Also there are two classes from the jNode project.

The idea of the Java processor is the Java code written in Java assembler should look like an assembler. For it to be the case following steps was made - the AssemblerProgram class with all assembler instructions as fields was created, every group of similar assembler instructions was implemented as a separate class, indirect addressing helper also was implemented. The result is the Java syntax mimics assembler one as close as possible.

Now the process of writing assembler program in Java Assembler looks like follows:

First the AssemblerProgram instance should be created. If the instance represents an instance of AssemblerProgram successor class, then there is an opportunity to write programs most close to the assembler. In a method of the AssemblerProgram successor instance it is enough just to write instruction name (for example - mov) and after pressing the point key (.) select one of the instruction variants. The instruction variants most often have a form xYY where x is a prefix and YY is a number of bits the instruction operates on. The x prefix is here just for the Java compiler doesn't interpret the string as an incorrect method name. After number of bits is determined there go standard assembler arguments. It can be register, immediate number or indirect addressing construct. In some cases a label can be used as an argument. To use register argument you just write standard register name in upper case, or it can be written even in lower case and (in many IDEs) after pressing ctrl+space there will be an upper cased version. If it is required to write a number - just write a number. If it is required to write an indirect addressing construct then press small letter 'i' and within the round brackets write register name or a number or a construction like this - i(EAX,EBX,Scale.S4,123456). The last construction represents an indirect address calculation with plus and multiplication signs replaced with commas and equal to [EAX+EBX*4+123456].

For the Java Assembler program to be translated into machine codes it is enough just to call AssemblerProgram's method getProgram(). It returns an array of bytes with the assembler instructions represented as machine code bytes. If when an instance of the AssemblerProgram was created it was provided with a DebugStream instance then there will be the program printed when getProgram() method is called. It can be useful for debugging purposes because of addresses of every instruction printed and jump distances calculated.

And now the Java Assembler technology usage in jEmbryoS. With it's help all native methods are implemented. The native method implementation is placed very close to the native method definition to resemble a standard Java method pattern - method name and arguments are followed by the method implementation. But it is also possible to place the implementation of native methods somewhere else. The implementation part is organized as a Java method with no or one special argument and return value of a byte array. The special argument is a compiler helper object holding required information and objects for a method call from the machine code. The return value represents a machine form of the assembler instructions, in other words it is a translated assembler program. For the compiler to recognize the implementation part of a native method there is an annotation mark used. The annotation class is NativeMethodImplementation and it is placed in package org.jembryos within the jEmbryoCore project. The annotation takes two arguments - the method name an it'd descriptor. The name parameter designates a native method name and the descriptor describes method's arguments and return value according to the JVM specification. Examples of the native methods can be found in the same project under the package org.jembryos._native.x86

In short: The Java Assembler technology is as way of writing an assembler program in Java. With the attention paid to the assembler-like look and feel of a Java program the resulting Java program differs from usual assembler program relatively little. The Java Assembler technology is actively used by the jEmbryoS project.

Compiler

For a Java code to be run on a processor there should be some means of transformation of the Java code into machine code. The means can be in range from a pure compiler to a pure interpreter. The second approach allows immediate start of a program execution but usually has worse performance than the compiler approach. Also the interpreter variant requires an another program to be run to interpret a Java code. The jEmbryoS employs the compiler approach as simple and performance friendly.

The current implementation of the jEmbryoS uses very simple compilation approach. It just translates the Java bytecodes in their assembler representation and then just lets the assembler to create a machine code. Such approach is far from an ideal compiler and stays right at the distance of heavy passthrough optimization. A compiler built with the optimization in mind would represent a very different software entity. This means there should be another compiler with very weak relationship to the existing one for the jEmbryoS to deliver it's best performance.

However, there is just one compiler now and it is not optimizing. But the current compiler implementation is very simple and works fine as a base for farther system development. The compiler is organized as a set of one to one bytecode translators. For every bytecode in JVM specification there is corresponding translator. Such approach splits the compilation into a small and manageable parts. And there is an integration class which organizes the parts into a whole thing. The standard compilation unit of the compiler is a Java method. The organizing class takes the method data as an input parameter and produces machine code for all bytecodes of the method. Method dependencies are managed with the help of a few additional classes. The machine independent part of the compiler is placed within separate package. All one to one bytecode translators are also placed within a separate package.

There are some implementation specific classes worth of noting. The compiled program contains some often repeatable fragments. For the sake of memory efficiency the fragments was separated and organized as procedures. For the separation of the object and calss data structure information from the method implementation a special structure helper class was created, it's name is MemoryStructureHelper. Also the exception processing logic was separated into another helper class.

The compiler is implemented as an independent component and has separate project completely dedicated to the compiler needs. A most important things about the compiler are highlighted as follows:

In short: The jEmbryoS employs a very simple compiler. The optimizing compiler will differ greatly from the existing one. The existing compiler just uses a one to one assembler representations of the Java bytecodes and writes the corresponding machine code sequentially to get a Java method machine representation.

Bootloader

For any software system to be run by the processor there should be some means to place the system in a memory accessible by the processor and to start processor instruction execution from a correct point. Such means as a whole are called a bootloader. There are different strategies employed to implement a bootloader, but because the jEmbryoS way is a simplicity, the jEmbrioS bootloader was developed with the simplicity in mind.

First there goes a boot record loader. By convention most of the computers after the power is on perform some initialization and after it is complete they load a master boot record (MBR) of the disk into the memory at a predefined location. Then the processor jumps to the location where the boot record contents is found. The jEmbryoS currently has such master boot record as first 512 bytes of the system image. When the image is written to the disk or usb flash drive the image's master boot record replaces the existing one on the disk and after the disk is read by the starting computer a required boot record is loaded into the memory.

The process that is shown above is not a standard procedure because of internal disk structure currently in use. The disk usually has some file system on it and the file system is separated from the whole disc space within a partition. For it was possible to load one of many operating systems on the disk the standard sequence of the boot procedure imply the loading a boot sector of a particular OS. For it to be loaded the master boot sector contains an information about which partition is active and the code from the master boot record just looks for the location of the active partition and loads it's personal boot record with jump to it's start. Partition's boot record is called a volume boot record (VBR). For such scenario to be implemented with the jEmbryoS it is possible to use the same code as is used for the MBR for the partition's VBR having updated the offset on the disk from which the jEmbryoS image starts.

After jEmbryoS boot sector code is loaded and running it just loads blocks from a disk using predefined offset and places them to a predefined memory location. Because it is not convenient to change the boot record too often another booting piece of a software was introduced. The introduced piece is loaded by the boot sector code and the responsibility of the boot sector code is finished on it. Next the second stage bootloader loads the system image and initializes processor to work in a proper mode. Then there is a jump to the first instruction at the system image start. The system image supposes it is within a predefined environment an doesn't cares about it initialization except one thing - the interrupt vector table. Because there could be up to 256 interrupts defined and all of them should have a corresponding implementation within a system image it is not suitable to assign a responsibility of the interrupt initialization to the bootloader code. Also the system initializes the interrupt controller and the programmable interval timer. After it is done a jump to the first JVM's method is performed.

The location of the bootloader code and related stuff is as follows:

For the compiler to be as simple as possible a complex procedures should be implemented in Java. Of course, the implementation of the procedures is also compiled into machine representation, but if the implementation does pay attention to the cases of recursion to itself it is possible to use it combined with a compiled code of a method. For such a use there is a method invocation code that is added to the compiler implementation. Also there is an interface defined for the set of methods called from the compiled code. For the implementation of the interface was stateless the static methods are selected as the implementation strategy and the interface serves for information purposes only. The classes responsible for implementation of the methods called from a machine code are as follows: There are two implementor classes used for it was possible to use the approach within a system initialization code or in cases of a possible simplified implementations of the system for some special purposes.

In short: The jEmbryoS is loaded using two stage boot loader. First part is responsible for the second part load only. The second part is responsible for the system image loading and for an initialization of the environment to suite the needs of the system.

System image

The system image preparation is an important procedure which ensures there is everything for the system to run properly. The important components are the class data structures, the method implementations, the JVM instance. Everything else can be initialized at runtime, but when it is simple to initialize some stuff at a build time then the stuff is included into the system image.

The system image build classes are placed within the project jEmbryoBootstrap. There is a central point of the image build in the Starter class main method. The build sequence consists not only of the system image build itself, but also includes a file system preparation. As such a file system the FAT32 version is chosen. Within the file system the boot record is placed and file system structure data is written. After the file system is ready the system image is appended to it as an ordinary file. Another ordinary file is employed - it is a sample application jar file. The boot loader uses a bit more blocks to load the system image and if right after the image there is something else it is also loaded into memory. This means that simply copying the jar file to the distribution media right after the image is enough for it to be loaded and used by the system. It gives the nice feature of easily replaceable sample application jar by any interested user.

The system image can be built in three forms - a standard image (or full), a bootstrap image and a simplest one. The first is a default image type and contains all of the jEmbryoS. The second is an experimental case for use in memory constrained systems with the simplest and not reclaimable memory and other simplifications. The second form is not ready to be used and requires some additional work, but it seems in the end it can fit within first 640kb of the standard memory layout of x86 based machines, also leaving some place for HDD driver, for example. The third form is a smallest and simplest Java system to be run on a x86 compatible processor. It completely lacks of memory allocation and deallocation facilities, but it can use predefined at a build time memory array and any other objects. Also it is possible to use required information as long, int, short or byte numbers. While it is not Java compatible to not support an object creation (including exceptions), but it requires only one and half kilobytes for the code of a 'hello world' system. It is not the assembler like performance but with the help of a future optimizing compiler the memory footprint can be reduced significantly. And event in it's current form the simplest kind of the Java System can fit within a tiny SRAM of the very cheap microcontrollers and represents an interesting alternative to the existing technologies of the microcontroller software development.

For the image build application to be run there is a need to tell it where to get it's input and where to place it's output. The input is taken from the default project layout. The default layout supposes that all projects are in the same directory. If it is the case then the input is correct. Next there should be the output. The output includes a number of system image files and possibly it's content as separate files. The path to the output should be given in the build.properties file in the jEmbryoBootstrap directory. Having all the input and output defined it is possible to run the build program and get a correct result.

In short: The system image contains all data required to run the jEmbryoS. It mostly consists of class information and method's machine representation. The system image can be built in three forms - the standard, the bootstrap and the simplest. Each form can be suitable for a particular area of a target system like standard PC or a boot loader or a microcontroller.

Debugging

The trace stack existence makes it possible to debug the jEmbryoS almost as an ordinary Java application. While there still are some problems with the debugger technology it, however, delivers a superior capabilities if compared to the assembler level debugging.

For debugging the jEmbryoS there is a separate project. The project name is jEmbryoDebug. It defines a user interface front end with an almost standard set of debugging capabilities exposed. Main class of the debugger is located as follows:

The debugger uses the Qemu emulator to run the system. The Qemu emulator is accessed over TCP/IP using GDB protocol. It is possible to debug the system directly on a running machine, but there should be some additional modules (like COM port driver) installed to make on machine debugging possible. Also debugging on a machine requires some wires and, obviously, the second machine. This is a bit of inconvenience and such approach still is not implemented. However, debugging using Qemu is slower than using a machine and often provides different results due to hardware difference problem. The system under Qemu works at least 30 times slower, but the actual debugging with step by step program tracing works relatively quickly.

To run the debugger there should be the Qemu installed. Qemu installation consists of copying it's file to a selected directory. But it is the case with the precompiled version. Normally the qemu is distributed as a source code and requires a knowledge of the Linux C compiler environment or of it's counterpart on Windows. My version of the Qemu was precompiled and I had no problems because of the precompilation. Another tool the debugger requires is the Nasm assembler. Actually only it's disassembler is used, but it is better to get a whole thing than possibly non working parts. Having all this software it is required to tell the debugger where the software is located. For this purpose there is the file debug.properties in the the debug project folder. It should be updated properly. For disassembler to be more informative the image addresses file is also used. Normally it is located under the jEmbryoBootstrap project and if all jEmbryoS projects are in the same directory then the file should be automatically found. And finally, there should be the jEmbryoS projects available to provide the debugger with the system source code. After all dependencies has been checked you can run the main debugger class as a standard Java class with method main.

After start the debugger will attempt to start Qemu. After the Qemu is started the debugger establishes a connection to it and waits until the system image will be loaded. After the image is ready the debugger window is shown. The debugger window has three main areas with two groups of controls above them. Left area is used to show current invocation stack trace. Right area is used for different purposes, but more often it displays a current method variables. The center panel shows a source code. Above the left panel there are a group of execution control buttons. To step in use 'In' button, to step over use 'Over' button, to step out use 'Out' button. If the system hangs or in some other way disturbs you it can be restarted using the button 'Restart'. After the button press the Qemu will be restarted and debugging session will begin from its initial state. To let the system skip some piece of code you can press the 'Run' button and the emulated system will run until a break point is met or you restart it. But there is an issue with breakpoints.

It is ineffective to have standard break points selected by a mouse press on a line in the source code. The interaction with Qemu imposes a strong overhead on the debugger in cases when it is required to filter debug events too often. And exactly the situation like this happens when there is a need for a mouse pointed break points. The GDB protocol has limited set of debugging capabilities and do not allows a conditional break points. This means that all conditions should be checked on the client side and if there is a lot of cases where unconditional break point is hit the performance becomes very bad. Another closely related problem is a lack of information about the address of every line in every source code file. If it was here then mouse triggered break point can be implemented effectively, but it requires compiler improvements for collection of every line address and saving them in a file which should be used by the debugger. It is not a lot of work, but the required update of the compiler and debugger is not done yet. For now (without the line address information) the line break point implementation will require to monitor updates of a particular trace stack position and filter all lines except the target. If the depth of the stack trace with the required position is often reached before the required break point is hit then the communication overhead will slow the debugging very significantly.

Because of the issue described above the break points are implemented in a simplified manner with an explicitly defined break address. Using the imageAddresses.txt file from jEmbryoBootstrap project it is possible to find a target method address and to copy it into the debugger's address field. This is not very convenient but from my experience this is also not a very demanding issue comparing to the complexity of a search for the reason of one of simultaneously running tasks is jumped outside the method space.

Next there is the 'Pause' button. It stops the execution and displays current state of the JVM. It is helpful when some task is working too long and after stopping it the picture can be viewed with a plenty of details. Another useful button is the 'Asm' button which steps one assembler instruction at a time and displays the disassembled fragment of the code near the current instruction in the center panel, while in the right panel the processor registers are displayed. The break address field displays the address of currently executing instruction and when stepping over assembler instructions it is possible to append there, for example, '+2' and press the Run button to skip an assembler procedure call (which length is 2 bytes).

There also is a group of buttons on the right side. Above the group there is the address drop down list. The data in the list is used by a number of buttons below. After the number in the list is used by a button it is appended to the list and can be selected again instead of typing. It is also useful to mark the number appending any string after a space, for example - '234bf44 jvm'. The mark will be preserved in the list and will allow to find the number easily. The meaning of the buttons below the list is as follows:

These are all currently implemented debugger capabilities. Frankly, the debugger can be better, but the work on it was a low priority task. Another point about the debugger problems is the Qemu interaction. The Qemu not always uses the GDB protocol in a rigid manner and there is a need in much better checking and circumventing of the protocol instability issues. And of course, there could be my bugs :)

In short: The debugger technology was developed and allows step by step debugging of a Java code which is run within the jEmbryoS. Also there is a possibility to step over assembler instructions and watch register values and raw memory.

Testing

There is the project dedicated for the jEmbryoS testing. The project name is jEmbryoTest. It contains tests for the compiler, memory allocators, garbage collector and the scheduler. The compiler tests are implemented for every bytecode and one test checks method call implementation. The allocator tests are implemented for bit marked memory maps of type int, long and array, also the test is implemented for the linked list memory map. The scheduler test covers scheduler command processing and can be used to assess runtime distribution uniformity. For debugging the scheduler test itself there is an option to use recorded list of scheduler commands. The scheduler test prints generated commands to the System.out and if there is a problem with it the printed text can be copied and pasted into the file scheduler.txt under the jEmbryoTest project. After next start the scheduler test reads the commands and if the list contains at least one item - executes the commands from the list. The memory allocator and scheduler tests are jUnit based while the compiler test uses it's own miniframework. The compiler test uses Qemu and requires it to be installed and the location of it to be described in the file debug.prperties of the project jEmbryoDebug.

The locations of important test classes are as follows:

In short: The tests are implemented for the compiler component of jEmbryoS and also for it's memory allocators and context scheduler.

Results achieved

The main result of the jEmbryoS development so far (beginning of the year 2014) is a technology demonstration of the Java Operating System. Also the set of auxiliary technologies was developed. Together the jEmbryoS and the technologies create the environment for the Java Operating System popularization and extension. Hopefully, this project can provide an easy start for Java developers interested in a very promising area of development - the Java Operating System. And one more point - the jEmbryoS has an interesting advantage comparing to the existing Java OS variants - the jEmbryoS is ready for change. Yes, having less code base the system has no legacy. And what it legacy ? Can Microsoft make a new Android from Windows in a year or two ? It can if the Windows will be thrown into the trash can.

Projects and dependencies

There are 15 projects developed (including documentation). The system is split into the projects because of following reasons - the technologies independence, architectural transparency, security and circular reference problem. First two reasons are obvious. The security issue arises when the layer isolation problem is analyzed. There should be no means of access to the system layer from other layers and such requirement leads to the separation of the system layer. But the system layer also should be separated from the architectural point of view. However, the security issue prevents access to the system layer even in cases when it is required, like is the case with the adopted part of the Java API. That's why the project jEmbryoHelperAPI was introduced. It also helps to break a circular dependency of the projects.

Following is a developed project list with the purpose of each project and its dependencies described:

In short: The jEmbryoS development efforts have a result of 15 projects built.

Performance and limitations

The jEmbryoS performance at the current stage of development is very weak. It can be assessed as 40 to 50 times slower on average than existing versions of the JVM from Oracle (former Sun's JVM). The main reason for such underperformance is a missing optimizing compiler. Also there are issues of performance traded for simplicity, ease of development and debugging.

In the area of Java API support the system lacks a lot of standard classes. For the Java API to be supported in a full scale there should be such things implemented as a networking, hard disk access and so on. Currently there are only keyboard and text screen drivers developed. The keyboard driver supports so called standard PS2 keyboard which not always is provided by the hardware vendors and as a consequence, unfortunately, not all modern computers can run jEmbryoS.

There are still the system reliability issues in play. After active usage the system tends to hang and such behavior is not eliminated yet. To catch the problem more advanced debugger features are required. And, of course, the system code should be carefully checked for a critical situation processing improvement.

But as whole the system works, it is suitable for testing and improvement of the existing and new Java OS technologies.

In short: The system performance and reliability are not ideal and require additional efforts for the system to became more stable and for it to work faster. But as a starting point of the Java OS development the system seems as an acceptable variant.

System installation

Warning - if you will write the system image to your installation media all information on the media will be lost.

The preferred way of the system installation is a USB flash disk creation. Main advantage of such disk is it's usage simplicity. But, of course, it is possible to install the system on a hard disk and the installation steps in both cases are the same. There is actually just one step - the system image should be written to the target media. The image provided with the system distribution contains FAT32 file system with two files - system.bin and sample.jar. First file is the data that should be loaded into the memory. The second file is an archive and includes a set of sample applications. For the system to start properly it is required to write the image.bin file to an installation media at the correct position. The correct position is the media start. It is not a standard operation a file system can do for you and because of this there should be a tool to write the image blocks directly to the media in the order they are in the image. I use the HxD editor, but it works fine only on Windows XP. The Vista and it's successors prevent the editor from writing to the system area of a media. However, there are many tools that can do the required work.

After you have a tool for writing raw system image to the installation media all you have to do is just tell the tool to write the system.bin file to the target media. After the image has been written the media will contain the FAT32 file system with two files on it. The maximum size of the file system is 4 gigabytes, it can be less or more the size the actual media has. If the media size is smaller then you should be careful and check if copied files will occupy the missing space after the media end. Next, it is important not to change the location of files on the media, if you wish to use the media to store other files - just do not delete the files system.bin and sample.jar and the required order will be kept untouched. It is possible to replace the sample.jar file with your own, but new file should be placed sequentially right after the system.bin. If there are no more files except the system.bin then any newly copied file will be placed at the correct position, but if there are other files - the position will be wrong and the sample applications won't work. The sample.jar file size increase can not be more than 16 kilobytes, else only part of the jar will be loaded and system will be unable to read it.

Next the media can be used to boot from it a computer. In case of USB flash disk it is often required to change computer's BIOS settings to allow booting from a USB disk. After the settings are changed you can insert the installation USB disk into a USB slot and, after turning the computer on, it often is required to press one of the F keys to tell the BIOS you want to use the flash media. If everything is done right the system will start it's loading and you will see as the screen is filled with dots when blocks of the image are copied into the memory. After some short initialization the system will show you a prompt and the cursor becomes visible. Now you can test the jEmbryoS :)

In short: The system installation consists of just writing a raw image to the installation media.

Console application usage

The console application provides an interface to interact with the system user. The currently implemented interface is text based and resembles functionality of similar interfaces of many mature OS. The text implementation is simple and requires no video card driver because the text screen access on all vga-compatible devises uses direct memory access for writing and reading characters and their attributes. However, there is the text screen driver, which hides details of memory access and exposes relatively high level interface.

The console application was architected as a tree of views with heavy usage of the model-view-controller pattern. It is placed within the jEmbryoService project and it's main class is org.jembryos.service.console.Console. The hierarchy of views includes the system screen, the console screen, the application screen, the text view for the command line, the scrollable area for the system output to be shown, the value view to show processor load as a column on the right side of the screen.

All console commands can be seen after the command 'help' or '?' is entered. The commands mostly display an information about the system state like list of running tasks and garbage collector state. There also is a set of hardware information providing commands which show data like PCI device list and processor information. To scroll the console screen it is required to press PageDown or PageUp keys, the arrow keys are used for the command history display and cursor position change.

Another set of commands includes means to start applications. For application to be started it should be within the sample application jar file which is loaded with the main system image. The System installation section shows an additional information about managing the sample applications jar. To get information about the content of the jar file there is a command JarTaskList. After the sample application is chosen to have it running it is required to write it's main class simple name. If there are more than one application with the same class name (but different packages) the full class name will help to point to the correct application. The console commands are case insensitive, but the class name should be written retaining original case of the letters. It worth to note that the tradition of using very short commands is not maintained because of self describing names much better speak for themself than short and cryptic letter combinations, for the period of user base grow such approach can be helpful.

After an application is running it's screen is shown. For a short time the screen will be black because of the application compilation delay when it is impossible to get an output from the application. Then the application will display it's text (if programmed, of course). The console supports many application screens and provides a mean of switching the screens. To switch to an another application or to the console screen the standard Windows combination of Alt+Tab is used, Ctrl+Tab also switches the screen. When Alt and Tab are pressed the task list dialog will be shown and if you hold the Alt and press the Tab again the selected task will be changed. After Alt and Tab buttons are released the selected task or console screen will be displayed. If the task finished or it is no more required then the commands Alt+F4 or Ctrl+F4 or Ctrl+C will close the application screen and kill the application if it is still running. The application screen can be scrolled using 4 arrow keys or PageDown/PageUp buttons.

In short: The console application provides the user with the system information and means to run applications. The task management is partially provided by the application screen management commands. Due to the embryo state of the system there are not too many tools the console can deliver.

How to extend

The most demanding areas of extension are described in the section Steps to be taken. But, of course, if your interest is with other areas then you can find a lot of things to be done there.

For the system to be extended effectively it is better to concentrate on something single. When the area of interest is chosen then it is recommended to read about it in this documentation. Having initial familiarity with the subject will allow to look at the system code with deeper understanding. Then it is time to copy all jEmbryoS projects and to open them with your favorite IDE. If the IDE is the Eclipse then the projects are ready to use out of the box. If it is another IDE then there is one more step required - to ensure the project interdependencies are supported. For it to be the case it is recommended to read the section Projects and dependencies. After the projects are opened and compiled without errors it is time to look at the code of an interesting project and to do something useful there.

Next the system image should be build. For building the image there is the project jEmbryoBootstrap. More information about it can be found in the section System image. Having the system image created it becomes possible to test and debug it. For testing run a most convenient tool is an emulator. There is the debug project which uses Qemu emulator and provides the user interface for step by step debugging. Additional information about it can be found in the Debugging section. And if an updated version works well then there is an installation issue open for the system to be tested on a real hardware. The installation steps are covered in the System installation section.

When all tests show you the system doesn't work worse than it was before it is advised to share your results for them to be included in the common version of the system. Having your results there gives you a smooth merging of system updates from other developers with the results produced by you.

In short: Following steps are recommended - to choose the area of interest, to read about it, to open all projects in your IDE, to make changes, to build the system image, to debug, to test on a real hardware and to commit changes to the system repository.

Quality expectation disclaimer

At the time of writing (the beginning of 2014) the jEmbryoS is by no means a mature system. It should be viewed as a technology demonstration or a set of technologies to be extended. To become a production ready system it should be extended in many areas, described in the Steps to be taken paragraph. However, until the extension steps are not implemented it can be useful as a playground for architecture assessment and component development. And, of course, it can be used as a base for the extension steps.

Copyright (C) by Alexey Bezrodnov, year 2014