Wednesday, 21 March 2012

Set Theory

I had a code problem that most programmers would be very familiar with. I wanted to create a set of names. The definition of set is basically a number of items where each item must be a unique element. So for example, if I add the name "Dave" twice it only stores one copy of the name. The important requirement was that the set returned the index (number) of the name in the array .. for example

0: Craig
1: Jen
2: Dave
3: Gary
4: Peter

when inserting the name "Dave" I want the number 2 to return from the function.

The easy way to perform this is to say "for each element, if name equals the new Name, return the number".

This method becomes quite slow for very large lists of names. So I wrote this class ...


class ArraySet
{
public:

string& Get(int p)
{
if( p < elems.size() )
{
return elems[p];
}
else return string("");
}

void Append(string &s)
{
elems.push_back(s);
}


vector<string> elems;
};

class TreeNode
{
public:

TreeNode()
{
parent = NULL;
left = right = NULL;
data = -1;
searchVisited = false;
}

TreeNode(TreeNode* pParent)
{
parent = pParent;
left = right = NULL;
data = -1;
searchVisited = false;
}

int data;
bool searchVisited ;

int Insert(string &s2, ArraySet &set)
{
string s1 = "";
s1 = set.Get(data);

if( s1 == s2 )
{
return data;
}
else
{
if( s1 < s2 )
{
if( right )
{
return right->Insert(s2, set);
}
else
{
right = new TreeNode(this);
set.Append(s2);
right->data = set.elems.size()-1;
return right->data;
}
}
else if( s1 > s2 )
{
if( left )
{
return left->Insert(s2, set);
}
else
{
left = new TreeNode(this);
set.Append(s2);
left->data = set.elems.size()-1;
return left->data;
}
}
}
}

void Visit(vector<int> &rmp)
{
if( left )
{
left->Visit(rmp);
}

rmp.push_back(data);

if( right )
{
right->Visit(rmp);
}
}

TreeNode* parent;
TreeNode* left;
TreeNode *right;
};

class TreeIndexedSet
{
public:

TreeIndexedSet()
{
root = NULL;
}

int InsertString(string &str)
{
if( root )
{
return root->Insert(str, m_set);
}
else
{
root = new TreeNode;
m_set.Append(str);
root->data = m_set.elems.size()-1;

return root->data;
}
}

string& LookupString(int p)
{
return m_set.Get(p);
}

void SortRemap()
{
if( m_set.elems.size() > 1000000 )
{
SortRemapIterate();
}
else
{
root->Visit(sortedRemapping);
}
}

void SortRemapIterate()
{
        TreeNode *currNode = root;

        while (true)
        {

            if (currNode == NULL)
                break;

if (currNode->left != NULL && currNode->left->searchVisited != true)
{
currNode = currNode->left;
}
            else if (currNode->right != NULL && currNode->right->searchVisited != true)
            {
currNode = currNode->right;
}
            else if (currNode->searchVisited == false)
            {
sortedRemapping.push_back(currNode->data);
                currNode->searchVisited = true;
            }
else
            {
currNode = currNode->parent;
}

        }
}

vector< int > sortedRemapping;
TreeNode* root;
ArraySet m_set;
};

Tuesday, 24 January 2012

Memory Management ...

My old post on Memory mapped files made me think about how to implement some technology that is rare in current software knowledge bases.

This lead me to delve into my memory for this website ... a strong and free memory manager is available here

http://www.paulnettle.com/

Listen to the music ...
The concept is simple ... to replace the hash table memory management system here by removing calls to malloc and to simply use a memory mapped file to map large areas of the disk to the process, the limitation is theoretically the address space of the program, but normal practical terms limits this to 2GB even for 64 bit applications (citation needed). This means technically that I can have computers use 2GB RAM and 2GB RAM for free!

http://msdn.microsoft.com/en-us/library/windows/desktop/aa366761(v=vs.85).aspx

The function MapViewOfFile(...)
The gives access to a portion of the file in memory, so if for example I open a 2GB file then I can map a portion, say 1MB at the 1.5GB boundary, and all I have to do is supply the numbers for the offsets into the file, then I have 1MB from the 2GB data pool to read and write to. I think this explains why I need a decent memory management implementation to work with.

Sunday, 8 January 2012

Preparing for a release

I am working on finishing my task list for version 1 of my recent project, and I have a todo list with 79 tasks mostly marked as "ready for retest". The development process involves much testing and bug fixing, possibly more testing than anything else, so I am moving to a beta test phase asap.

I run into a psychological barrier when I try to work on dialog's ...
Dialog's are the small windows that appear when a program function requires some settings and user input, usually filled with radio boxes, check boxes, choice boxes and buttons. The problem is that they are just completely boring, and so development stops for a while.

Wednesday, 4 January 2012

Memory Mapped Files

Ok so I am at the pre-beta stage in my project and I discover that I cannot effectively load 100Mb files without using huge amounts of system memory, I look for a solution and find this ...

Memory Mapped Files.

These files are stored in virtual memory and allow the program to have read/write access - and they can be huge in the range of >2Gb. They also speed up read/write access by lowering the number of copy operations needed.

They are implemented in Boost for C++ and have been considered for elevation to the standard C++ library (unless they haven't already been included). So I had to download Boost and try to use one ... Boost is currently building atm, a completely ingenious build system that is very complex to work out.

Sunday, 1 January 2012

C++ programming Getting rid of sprintf()

like the name, I suspect the function "sprintf()" is very fast, but for the sake of non-programmers I am going to explain what it does.

http://www.cplusplus.com/reference/clibrary/cstdio/sprintf/

so if I want to format a block of text into a buffer with numbers I can use sprintf(). The only problem is with the safety of using sprintf, the convenience of allocating an array of bytes in memory every time I want to make some text

Well now (and for a very long time since the creation of the C++ std library) there is a safer and less buggy solution, that also makes the code look a lot neater. Its called "crawls()" ... just kidding.

Now I use this function


template <class T>
inline std::string to_string (const T& t)
{
std::stringstream strStream;
strStream<< t;
return  strStream .str();
}

this turns the number into a string without all the old-style conditional statements.