Roller Coaster Ride

Traces to leave, thoughts to share.

Xml Parsing (and Indexing)

Posted by alimbourg on May 30, 2010

One of the idea behind GFE ([dje-fe], see previous post) is to be the less intrusive and the most reactive application. A front-end shouldnt need long processes stopping the user  about to play, nor any private files written on disk (especially if you want it running from a DVD).
But GFE needs to display informations about the currently selected game (from a collection of 10000 items).

Such data exists and is usually available through (big) xml files… They’re generated by Mame.exe, or available from the Net (google ‘dat file emulator’).

Hence the idea to parse like 40 Megs of xml data on the fly (at each run), and index/map the document to avoid keeping the whole file in memory.

After two weeks studying the topic, it’s time for conclusions :

  1. Microsoft Xml Pull Parser is fast. Written in .Net, but *correctly* written, it’s generally a good tool for your xml needs. It’s a pull system, easier to manipulate than SAX, and probably faster (look at XmlTextReader for reference). It takes 750 ms to walk 40 megs of data (8800 records) on my computer.
  2. Unfortunately you can’t bookmark the interesting parts of your document with it. We could rely on the line/character pair this parser is returning but then another text parser should extract text blocks from line/characters pairs. Maybe tricky to implement and we’ll lose some horsepower in the process.
  3. Note that parsing xml files is easy as long as we don’t want document validation or xpath support: state machine only has to handle 10-12 token types, c# is silently handling various character format.

So what ? Why not writing a dedicated indexing-xml-pull-parser ?

Some results after a dozen hour of coding, a custom pull parsing implementation:

  1. It’s now as fast as MS XmlTextReader:  first version wasn’t :) too much function calls is hitting very hard c# performance (8 times slower than the original !), solution is to pack the whole state machine into a single function: now it’s easily indexing  50 Megs of xml per second.
  2. It’s able to give file stream position information about each element start and end: during some initialization pass, an indexer stores each record key and location values in a dictionnary, for further fast access.
  3. When GFE needs a particular record info, the dictionnary is requested for the location of the fragment which is read from huge file, and loaded  into memory to be thoroughly parsed. Apparently it’s fast enough for 100 random requests per second. It’s 100 times enough :)
  4. there is no 4.

This ‘works’ and fits a particular situation for a particular embedded application: no memory stress, no alien file creation. Currently it’s only limited by a minimal support of character types (it’s only handling ANSI+unicode).

But definitely it looks like an achievable way to go if you feel somewhat embarassed with existing implementation.

Below is the code source of the main state machine function: contact me for more implementation details.

<pre>
public TokenType Next(TokenType notifs)
        {
        _insideElement:
            if (_currentType < TokenType.END_ELEMENT)
            {
                for (; ; )
                {
                    //attribute
                    switch (_ioBuffer[_bufferIndex])
                    {
                        case ' ':
                        case '\t':
                        case '\r':
                        case '\n':
                            _bufferIndex++;
                            continue;
                        case '>':	//end of start element
                            _bufferIndex++;
                            _depth++;
                            goto _beyondElement;
                        case '/': //empty element
                            _bufferIndex++;
                            if (_bufferIndex + 1 > _dataLen)
                                PrefetchIO(1);
                            if (_ioBuffer[_bufferIndex] == '>')
                            {
                                _bufferIndex++;
                                _tokenEndOffset = _bytesParsed + _bufferIndex;
                                _currentType = TokenType.END_ELEMENT; //EMPTY ELEMENT ! it starts and ends on the same token
                                if ((notifs & TokenType.END_ELEMENT) != 0)
                                    return _currentType;
                                goto _beyondElement;
                            }
                            return DoExpected(">");
                        case '\0':
                            if (!FillIOBuffer())
                                return TokenType.END_OF_STREAM;
                            continue;
                        default:
                            _currentType = ProcessAttribute((notifs & TokenType.ATTRIBUTE) != 0);
                            if ((notifs & TokenType.ATTRIBUTE) != 0)
                                return _currentType;
                            continue;

                    }
                }
            }
        //
        //beyond element: data, or comment, or pi, cdata
        _beyondElement:
            for (; ; )
            {
            _start:
                switch (_ioBuffer[_bufferIndex])
                {
                    case ' ':
                    case '\t':
                    case '\r':
                    case '\n':
                        ++_bufferIndex;
                        continue;
                    case '\0':
                        if (!FillIOBuffer())
                            return (_currentType = DoEndOfStream());
                        continue;
                    case '<':
                        _tokenStartOffset = _bytesParsed + _bufferIndex;
                        ++_bufferIndex;
                        for (; ; )
                        {
                            switch (_ioBuffer[_bufferIndex])
                            {
                                case '!': //comment, cdata, doctype
                                    _bufferIndex++;
                                    if (_bufferIndex + 7 > _dataLen)
                                        PrefetchIO(7);
                                    if ((_ioBuffer[_bufferIndex] == '-') && (_ioBuffer[_bufferIndex + 1] == '-'))
                                    {
                                        _bufferIndex += 2;
                                        ProcessComment();
                                        goto _start;
                                    }
                                    if ((_ioBuffer[_bufferIndex] == '[') && (_ioBuffer[_bufferIndex + 1] == 'C') && (_ioBuffer[_bufferIndex + 2] == 'D') &&
                                        (_ioBuffer[_bufferIndex + 3] == 'A') && (_ioBuffer[_bufferIndex + 4] == 'T') && (_ioBuffer[_bufferIndex + 5] == 'A') &&
                                        (_ioBuffer[_bufferIndex + 6] == '['))
                                    {
                                        _bufferIndex += 7;
                                        ProcessCData();
                                        goto _start;
                                    }
                                    if ((_ioBuffer[_bufferIndex] == 'D') && (_ioBuffer[_bufferIndex + 1] == 'O') && (_ioBuffer[_bufferIndex + 2] == 'C') &&
                                        (_ioBuffer[_bufferIndex + 3] == 'T') && (_ioBuffer[_bufferIndex + 4] == 'Y') && (_ioBuffer[_bufferIndex + 5] == 'P') &&
                                        (_ioBuffer[_bufferIndex + 6] == 'E'))
                                    {
                                        _bufferIndex += 7;
                                        ProcessDocType();
                                        goto _start;
                                    }
                                    return DoUnexpected("-");
                                case '?': //pi
                                    _bufferIndex++;
                                    ProcessPI();
                                    goto _start;
                                case '\0':
                                    //--- fill data
                                    if (!FillIOBuffer())
                                        return (_currentType = TokenType.END_OF_STREAM);
                                    continue;
                                case '/': //end element </ ...>
                                    _bufferIndex++;
                                    _currentType = ProcessEndElement((notifs & TokenType.END_ELEMENT) != 0);
                                    if ((notifs & TokenType.END_ELEMENT) != 0)
                                        return _currentType;
                                    goto _start;
                                default:  //new element
                                    //(notifs & TokenType.START_ELEMENT) != 0);
                                    _currentType = ProcessStartElement(true);//always in full as element name may is extremely usefull to get
                                    if ((notifs & TokenType.START_ELEMENT) != 0)
                                        return _currentType;
                                    goto _insideElement;
                            }

                        }
                    default: //element data. (probably.)
                        _currentType = ProcessData((notifs & TokenType.DATA) != 0);
                        if ((notifs & TokenType.DATA) != 0)
                            return _currentType;
                        goto _start;
                }
            }
        }

//Locking Undo/Redo
ITURManager::Get()->Lock();

Posted in Code, GFE | Tagged: , , , | Leave a Comment »

GFE, Game Front End

Posted by alimbourg on May 11, 2010

Hey,

I spent the last months fighting/juggling/coding with WPF. Not the easiest hobby in town, i should i mention, anyway: this is GFE.

The idea behind the project is to be able to… Play. No fuss, and more generally minimal settings to be able to list and launch a vast collection of games: emulated, freeware, flash, choice is vast.

Basically GFE is listing a directory of… things (roms, zip, pics, you tell), probably thousands of them. Background scanners are trying to match every entry with snapshots and movies (furtherly stats and extra infos from web). And apart from primary enumeration, everything else is asynchronous for a smoother experience.

All settings are guessed from a directory structure passed as application argument, or preferably from a gameinfo.xml file.

It’s using low level inputs, preferably a joystick, enabling some interaction with GFE running in the background. And it’s a WPF project: so all the frontend rendering is done through DirectX and accelerated hardware if available.

OK now, this is the very first version of GFE, and, well: it’s usable, but probably not versatile or cute enough. I’m working on that: you can help with suggestions, money, or simple greetings.

Binary:  GFE-0.5.0.zip (45 Ko)

Technical details:

Windows – .NET 3.5, works on XP and Seven

Joystick (XInput compatible) or Keyboard

How to run it:

Modify included GameInfo.xml and use it as the application parameter.

Note: when in background, both LB+RB button (Ctrl+Back) is killing launched app and bring back GFE to front. LB+Y exits GFE.

alimbourg at gmail for any question.

This is GFE Emu FrontEnd

Posted in frontend, Uncategorized | Tagged: , , , | Leave a Comment »

OWL

Posted by alimbourg on March 10, 2010

Soon…

Posted in Uncategorized | Tagged: | Leave a Comment »

Developing for MediaCenter (Seven)

Posted by alimbourg on August 5, 2009

Windows Media Center

You know, i’d like to have a game museum available. For the kids, for me: as a former game developer i’m utterly *delighted* to be able to run all the past games on (so many) abandoned platforms… Because sometimes fun is still there, nice game mechanics ARE eternal…
Anyway for the last 30 years, that’s a lot of video games to reference and play, hence the need of a good frontend.
And why not using MS MediaCenter (MCE) ?
That program is cool, not *great*, it does not support emulation, but still, really cool.

Therefore we need to develop a plugin dedicated to mass-gaming. Fortunately MSoft provides a SDK. But… Wait… Did someone really tried to develop with such as thing ? It’s terrible, everything has to be done from scratch for strange reasons, because it’s flawed by design. Samples, public interfaces, resources, everything is forbidding you to access the LookNFeel of MCE: so, reinvent the wheel from scratch, or pass your way… Apparently.

Until 2 things:
Reflector : a MSIL disassembler, and decompiler , add an AddIn to batch decompile a complete assembly: all the MediaCenter sources at the palm of your hand.
and
ResTuner to extract the MCML resources from DLL/EXE. Hence MCML files (Markup files needed to design MCE interfaces), or ResourceExtract, NirSoft or Alonis Resource Extract

Now 5Mo of C# code and MCML sources files to browse…
Reverse engineering at its best…

Leave a message if you need details about this.
Cheers AwenWMC7

Posted in MediaCenter | Tagged: , , , , | Leave a Comment »

Mame132 mods for automated avi creationw

Posted by alimbourg on July 10, 2009

This to create a special MAME build to allow automation via AutoIt, for example: i needed to find a way to send keys inputs and triggers AVI recording of my games, at will.
As Mame is using ‘raw inputs’ it cant be fooled by standard key messages.

Fortunately this project is really really (really) easy to modify and build, so i patched some

Go there and follow every steps: http://mamedev.org/tools/

Then modify some sources.

First More Wndows Message to allow communication (for key events, and avi recording)


//in windows.c/winwindow_video_window_proc, around line 1361
//declared asomewhere in windows.c:
// void rawinput_keyboard_fake_update(int dik_code, int pressed);
case WM_USER+10:
if (lparam&0x80) MessageBeep(0);
rawinput_keyboard_fake_update(wparam, lparam);
break;
case WM_USER+11:
if (!video_mng_is_movie_active(window->machine))
{
const char* filename = NULL;
if (lparam!=0)
filename = options_get_string(mame_options(), OPTION_MNGWRITE);
//video_mng_begin_recording(machine, NULL);
video_avi_begin_recording(window->machine, filename);
popmessage("REC START");
}
else
{
//video_mng_end_recording(machine);
video_avi_end_recording(window->machine);
popmessage("REC STOP");
}
break;

Then a tweak to bypass raw inputs


//in input.c, around line 1964, before rawinput_keyboard_update
//
void rawinput_keyboard_fake_update(int dik_code, int pressed)
{
device_info *devinfo;
for (devinfo = keyboard_list; devinfo != NULL; devinfo = devinfo->next)
if (devinfo->rawinput.device != NULL)
{
devinfo->keyboard.state[dik_code] = pressed?0x80:0x00;
}
}

Build it and voila. You’ll have some new mame.exe listening for external messages.

I have such binary build available for the sking people.

Next post should be an explanation for the whole Mame Avi Move Maker build with AutoIt…

Regards (and eventually leave comments)

Posted in Uncategorized | Leave a Comment »

SQLite and obese debug informations

Posted by alimbourg on June 2, 2009

SQLite.c is the ‘amalgemon’ version of the Database: a huge .c file containing all the dependencies to compile SQLite into our projects.

Using GCC, and standard mingw compilation option, we probably all notice the ton of compilation warning telling us that debug infos are too huge to be handled correctly (if you tried that, it’s impossible to trace sqlite sources with gdb).

The problem is concerning the embedded debugging information in STABS format: GCC is handling more modern types of debugging info. We just have to tell it to use them (http://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html)

in your .pro file (assuming you’re using QtCreator), force these flags to be used:

#remove standard debug generation for c files
QMAKE_CFLAGS_DEBUG -= -g
#(*= 'if not present'), force uber debug infos, perfect for gdb
QMAKE_CFLAGS_DEBUG *= -ggdb

No more compilation warning=quickest compilation, and we’re now able to trace sqlite.c in gdb… Another win ;-)

Posted in Uncategorized | Tagged: , , , | Leave a Comment »

Qt and latest MinGW

Posted by alimbourg on June 2, 2009

If you’re using QtCreator as a code IDE on Win32, you probably noticed the debugging overall sluggish-ness…

Qt is using MINGW (V3.2 iirc) to compile projects. It’s an old flavor of mingw, and gdb.exe (the debugger) did grow up ever since.

Go for http://sourceforge.net/project/showfiles.php?group_id=2435 to get:

or

That should do. Install all this into a directory.

In QtCreator, tell locate the mingw you’re using via Tool->Options->Qt4->Qt Versions->Select the one you’re using-> mingw directory

That’s it.  Latest mingw flavor  seems to work on for me. And gdb is not sluggish anymore.

Posted in Uncategorized | Tagged: , , , , | Leave a Comment »

Qt, SQLite and Full Text Search

Posted by alimbourg on May 28, 2009

Dammit. Done.

I just managed to have my Qt application (some source repository search-as-you-type) doing some Full Text Search…

Qt is an excellent  API to build Application With. It supports widgets, xml, database, network, etc. all you need really.

I just wanted to use SQLite as a database to browse and index an important source repository (it’s perfectly legal :) and thunderbird is doing exactly the same with your mails fe).SQLite has an extension to do Full Text Search (FTS3: it indexes each word of table columns). But as an extension, you have to tell SQLite to embed it during compilation.

Qt is offering SQLite services via its SQL drivers framework, but Trolltech/Nokia didnt deliver the dll with FTS3 support.

Miracle of opensource (thanks Nokia), you are able to recompile the sql driver to fulfill your needs: therefore using latest sqlite sources and/or deinfing the extensions it should embed.

This is how it works: from your Qt SDK installation, with QtCreator open D:\Qt\4.5.1\src\plugins\sqldrivers\sqlite\sqlite.pro (this is the sqlite sql driver)

And add this, after the first DEFINES += :

DEFINES += SQLITE_ENABLE_FTS3 SQLITE_ENABLE_FTS3_PARENTHESIS

Eventually, change the source paths to a most recent version of SQLite sources (the ‘amalgamon’ version)

Build and voila, now you are able do this in your projects:

if (!query.exec("Select Path, Snippet(FTS3Component, '**','**','') as Snippet "
"from FTS3Component WHERE "
"Content MATCH 'file_dump'"))

and

if (!query.exec("CREATE VIRTUAL TABLE FTS3Component"
" USING FTS3(Path TEXT, Content TEXT,"
" tokenize simple 'dummy' ' .;:!,[]')"))

That’s it, Happy Indexing

Posted in Code | Tagged: , , , , , | 1 Comment »

QtCreator Debug Helper…

Posted by alimbourg on May 27, 2009

QtCreator (1.1 for now) is great tool to work with…

The whole Qt thing is, too.

I’m sharing some experience with the beast.

If you need some relevant debug infos concerning the Qt class guts, you need to instruct gdb on how to print them.

It’s done via a ‘debugging helper’: some dll/plugin loaded by gdb on startup.

On my machine they didnt work without a rebuild after each Qt SDK every integration:

  • go to Tools->Options…->Qt4->Qt Versions,
  • check the active Qt SDK, and its path (QT_PATH).
  • *remove* the whole *QT_PATH/qtc-debugging-helper* directory via explorer
  • back to the Options Pane, select the active SDK, and press *Rebuild*
  • go to Options->Debugger->Debugging Helper, check ‘Use debugging helper’, uncheck ‘Use debugging helper from custom location’
  • next time you debug your qt app, you should get some ’43 custom… loaded’ from gdb

Posted in QtCreator Qt debug win32 mingw | Tagged: , , | Leave a Comment »

 
Follow

Get every new post delivered to your Inbox.