![]() Once we have counts in a tree, they introduce an entirely new way to search the tree. So we can add counts to a tree and still maintain it efficiently. Since the height of the tree is O(log N), this only takes you O(log N) time. For example, when you add an element, you increment the count of the node containing it, and then work back up the tree to the root incrementing the counts in all the nodes you go past. This can be done without sacrificing the log-time characteristics of the operations. Operations which alter the tree (insertion and deletion) now have to make sure these counts remain accurate. In that field, you store a count of the number of elements in or below that node. Suppose you add an additional field to every node of a balanced tree. ![]() Is there anything we can do to our balanced trees to make this work better? The only criterion we could reasonably sort on would be byte position in the file and as soon as we store our data as a set of (position, data) pairs, we're back to insertion being linear again, because we would have to alter the position field of every tree element after the insertion point. The conventional use of a balanced tree to store a sorted list, however, is not immediately helpful to us. This sounds like the kind of compromise we want: if making insertion constant-time forces seeking to be linear and vice versa, we would prefer to arrange for both to be log-time. You can insert an element into a balanced tree in log time, and you can search for a particular element in log time as well. Balanced tree structures (any of AVL trees, red-black trees and B-trees) all solve this sort of problem for sorted lists. Whereas in the original array format, of course, seeking was constant-time but insertion became linear-time. The common problem in both these methods is that as soon as you make insertion a constant-time operation, seeking to a given byte position becomes linear-time. In practice it isn't too bad, since the length of the linked list is at worst 1/K times the size of the file but once the file size becomes seriously big, this approach does not scale well. Jumping to a particular position, however, is still an O(N) operation using this structure. Inserting a single byte in the middle of a block doesn't cost too much occasionally the block will grow beyond size 2K and have to be split into two smaller ones, but even that isn't too slow. On the other hand, moving through the file now becomes a slow operation it's not noticeable when you're moving by a byte, by a line, or even by a screenful at a time, but as soon as you try to jump to the start or end of the file, or jump to a particular specified file offset, suddenly the editor has to bodily shift enormous amounts of file data from one end of the gap to the other.Īnother slightly better option is to use a linked list of small arrays, and to let the arrays vary in size between K and 2K bytes, for some fixed minimum block size K. When the user inserts an extra character, you just add it to one end or other of the gap. ![]() The file contents up to the current cursor position are stored at the start of the array the file contents from the current cursor position to the end are stored at the end of the array and the gap in the middle moves about as the cursor does. One technique used to support insert mode in editors is to use an array larger than the file size, with a gap in it. In this article I present an efficient and scalable data structure which supports all the operations needed by a hex editor. And as soon as you want your hex editor to have an insert mode, the data structure question becomes much more interesting. Not all types of file you might want to edit have the same restrictions as an executable. On the other hand, an insert mode can be useful in other circumstances. The only operation you really need to be able to do efficiently is to jump to a particular byte position, and that's precisely what an array makes easy. And a hex editor without an insert mode is very easy to implement: you simply allocate a large enough array for the input file, and use that as your data structure. Since they are mostly used for editing files such as executables, which contain a lot of cross-references to particular byte positions in the file, a hex editor need not have an insert mode in order to be useful. Hex editors have been around for a long time, and at the very basic level they are very simple to write. An Efficient Data Structure For A Hex Editor An Efficient Data Structure For A Hex Editor
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |