• chevron_right

      Federico Mena-Quintero: Fixing a memory leak of xmlEntityPtr in librsvg

      news.movim.eu / PlanetGnome · Yesterday - 00:57 · 7 minutes

    Since a few weeks ago, librsvg is now in oss-fuzz — Google's constantly-running fuzz-testing for OSS projects — and the crashes have started coming in. I'll have a lot more to say soon about crashes in Cairo, which is where the majority of the bugs are so far, but for now I want to tell you about a little bug I just fixed.

    The fuzzer found a memory leak that happens when librsvg tries to parse an invalid XML document that has definitions for XML entities — the things that you normally reference like &foo; in the middle of the XML.

    For example, this invalid document causes librsvg to leak:

    <!DOCTYPEY[<!ENTITY a ''
    

    Valgrind reports this:

    $ valgrind --leak-check=full ./target/debug/rsvg-convert leak.svg 
    ...
    Error reading SVG leak.svg: XML parse error: Error domain 1 code 37 on line 2 column 1 of data: xmlParseEntityDecl: entity a not terminated
    
    ==3750== 
    ==3750== HEAP SUMMARY:
    ==3750==     in use at exit: 78,018 bytes in 808 blocks
    ==3750==   total heap usage: 1,405 allocs, 597 frees, 205,161 bytes allocated
    ==3750== 
    ==3750== 247 (144 direct, 103 indirect) bytes in 1 blocks are definitely lost in loss record 726 of 750
    ==3750==    at 0x4845794: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
    ==3750==    by 0x4BD857F: xmlCreateEntity (entities.c:158)
    ==3750==    by 0x4BD932B: xmlNewEntity (entities.c:451)
    ==3750==    by 0x2EBC75: rsvg::xml::xml2_load::sax_entity_decl_cb (xml2_load.rs:152)
    ==3750==    by 0x4BED6D8: xmlParseEntityDecl (parser.c:5647)
    ==3750==    by 0x4BEF4F3: xmlParseMarkupDecl (parser.c:7024)
    ==3750==    by 0x4BEFB95: xmlParseInternalSubset (parser.c:8558)
    ==3750==    by 0x4BF50E9: xmlParseDocument (parser.c:11072)
    ==3750==    by 0x2ED266: rsvg::xml::xml2_load::Xml2Parser::parse (xml2_load.rs:466)
    ==3750==    by 0x4A8C49: rsvg::xml::XmlState::parse_from_stream::{{closure}} (mod.rs:628)
    ==3750==    by 0x2ACA92: core::result::Result<T,E>::and_then (result.rs:1316)
    ==3750==    by 0x34D4E2: rsvg::xml::XmlState::parse_from_stream (mod.rs:627)
    ==3750== 
    ==3750== LEAK SUMMARY:
    ==3750==    definitely lost: 144 bytes in 1 blocks
    ==3750==    indirectly lost: 103 bytes in 3 blocks
    ==3750==      possibly lost: 0 bytes in 0 blocks
    ==3750==    still reachable: 73,947 bytes in 746 blocks
    ==3750==         suppressed: 0 bytes in 0 blocks
    

    Let's see what happened.

    The code in question

    Even after the port to Rust, librsvg still uses libxml2 for parsing XML. So, librsvg has to deal with raw pointers incoming from libxml2 and it must do their memory management itself, since the Rust compiler doesn't know what to do with them automatically.

    Librsvg uses the SAX parser, which involves setting up callbacks to process events like "XML element started", or "an entity was defined".

    If you have a valid document that has entity definitions like these:

    <!ENTITY foo "#aabbcc">
    <!ENTITY bar "some text here">
    

    Then libxml2's SAX parser will emit two events to instruct your code that it should define entities, one for foo and one for bar , with their corresponding content. Librsvg stores these in a hash table, since it has to be able to retrieve them later when the SAX parser requests it. In detail, libxml2 requires that you create an xmlEntityPtr by calling xmlNewEntity() and then keep it around.

    xmlEntityPtr xmlNewEntity (xmlDocPtr      doc,
                               const xmlChar *name,
                               int            type,
                               const xmlChar *ExternalID,
                               const xmlChar *SystemID,
                               const xmlChar *content);
    

    Later, you must free each of your stored entities with xmlFreeNode() (it supports different data types, including entities), or if you are using libxml2 2.12.0 or later, with xmlFreeEntity() .

    void xmlFreeNode (xmlNodePtr node);
    void xmlFreeEntity (xmlEntityPtr entity);
    

    Librsvg creates a SAX parser from libxml2, calls it to do the parsing, and then frees the entities at the end. In the following code, XmlState is the struct that librsvg uses to hold the temporary state during parsing: a partially-built XML tree, some counters on the number of loaded elements, the current element being processed, things like that. The build_document() method is called at the very end of XmlState 's lifetime; it consumes the XmlState and returns either a fully-parsed and valid Document , or an error.

    struct XmlState {
        inner: RefCell<XmlStateInner>,  // the mutable part
    
        // ... other immutable fields here
    }
    
    type XmlEntityPtr = *mut libc::c_void;
    
    struct XmlStateInner {
        // ... a few fields for the partially-built XML tree, current element, etc.
        document_builder: DocumentBuilder,
    
        // Note that neither XmlStateInner nor Xmlstate implement Drop.
        //
        // An XmlState is finally consumed in XmlState::build_document(), and that
        // function is responsible for freeing all the XmlEntityPtr from this field.
        //
        // (The structs cannot impl Drop because build_document()
        // destructures and consumes them at the same time.)
        entities: HashMap<String, XmlEntityPtr>,
    }
    
    impl XmlState {
        fn build_document(
            self,
            stream: &gio::InputStream,
            cancellable: Option<&gio::Cancellable>,
        ) -> Result<Document, LoadingError> {
            // does the actual parsing with a libxml2 SAX parser
            self.parse_from_stream(stream, cancellable)?;
    
            // consume self, then consume inner, then consume document_builder by calling .build()
            let XmlState { inner, .. } = self;
            let mut inner = inner.into_inner();
    
            // Free the hash of XmlEntityPtr.  We cannot do this in Drop because we will
            // consume inner by destructuring it after the for() loop.
            for (_key, entity) in inner.entities.drain() {
                unsafe {
                    xmlFreeNode(entity);
                }
            }
    
            let XmlStateInner {
                document_builder, ..
            } = inner;
            document_builder.build()
        }
    }
    

    There are many Rust-isms in this code.

    • After doing the actual parsing with parse_from_stream() , self is destructured to consume it and extract its inner field, which is the actual mutable part of the XML loading state.

    • The code frees each xmlEntityPtr stored in the hash table of entities.

    • The inner value, which is an XmlStateInner , is destructured to extract the document_builder field, which gets asked to .build() the final document tree.

    Where's the bug?

    The bug is in this line at the beginning of the build_document() function:

            self.parse_from_stream(stream, cancellable)?;
    

    The ? after the function call is to return errors to the caller. However, if there is an error during parsing, we will exit the function here, and it will not have a chance to free the values in the key-value pairs among the entities ! Memory leak!

    This code had already gone through a few refactorings. Initially I had an impl Drop for XmlState which did the obvious thing of freeing the entities by hand:

    impl Drop for XmlState {
        fn drop(&mut self) {
            unsafe {
                let mut inner = self.inner.borrow_mut();
    
                for (_key, entity) in inner.entities.drain() {
                    // entities are freed with xmlFreeNode(), believe it or not
                    xmlFreeNode(entity);
                }
            }
        }
    }
    

    But at one point , I decided to clean up the way the entire inner struct was to be handled, and decided to destructure it at the end of its lifetime, since that made the code simpler. However, destructuring an object means that you cannot have an impl Drop for it, since then some fields are individually moved out and some are not during the destructuring. So, I changed the code to free the entities directly into build_document() as above.

    I missed the case where the parser can exit early due to an error.

    The Rusty solution

    Look again at how the entities hash table is declared in the struct fields:

    type XmlEntityPtr = *mut libc::c_void;
    
    struct XmlStateInner {
        entities: HashMap<String, XmlEntityPtr>,
    }
    

    That is, we are storing a hash table with raw pointers in the value part of the key-value pairs. Rust doesn't know how to handle those external resources, so let's teach it how to do that.

    The magic of having an impl Drop for a wrapper around an unmanaged resource, like xmlEntityPtr , is that Rust will automatically call that destructor at the appropriate time — in this case, when the hash table is freed.

    So, let's use a wrapper around XmlEntityPtr , and add an impl Drop for the wrapper:

    struct XmlEntity(xmlEntityPtr);
    
    impl Drop for XmlEntity {
        fn drop(&mut self) {
            unsafe {
                xmlFreeNode(self.0);
            }
        }
    }
    

    And then, let's change the hash table to use that wrapper for the values:

        entities: HashMap<String, XmlEntity>,
    

    Now, when Rust has to free the HashMap , it will know how to free the values. We can keep using the destructuring code in build_document() and it will work correctly even with early exits due to errors.

    Valgrind's evidence without the leak

    # valgrind --leak-check=full ./target/debug/rsvg-convert leak.svg 
    ==5855== Memcheck, a memory error detector
    ==5855== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
    ==5855== Using Valgrind-3.23.0 and LibVEX; rerun with -h for copyright info
    ==5855== Command: ./target/debug/rsvg-convert leak.svg
    ==5855== 
    Error reading SVG leak.svg: XML parse error: Error domain 1 code 37 on line 2 column 1 of data: xmlParseEntityDecl: entity a not terminated
    
    ==5855== 
    ==5855== HEAP SUMMARY:
    ==5855==     in use at exit: 77,771 bytes in 804 blocks
    ==5855==   total heap usage: 1,405 allocs, 601 frees, 205,161 bytes allocated
    ==5855== 
    ==5855== LEAK SUMMARY:
    ==5855==    definitely lost: 0 bytes in 0 blocks
    ==5855==    indirectly lost: 0 bytes in 0 blocks
    ==5855==      possibly lost: 0 bytes in 0 blocks
    ==5855==    still reachable: 73,947 bytes in 746 blocks
    ==5855==         suppressed: 0 bytes in 0 blocks
    

    Moral of the story

    Resources that are external to Rust really work best if they are wrapped at the lowest level, so that destructors can run automatically. Instead of freeing things by hand when you think it's right, let the compiler do it automatically when it knows it's right. In this case, wrapping xmlEntityPtr with a newtype and adding an impl Drop is all that is needed for the rest of the code to look like it's handling a normal, automatically-managed Rust object.

    • wifi_tethering open_in_new

      This post is public

      viruta.org /fixing-xml-entity-leak.html

    • chevron_right

      Juan Pablo Ugarte: New Cambalache development release 0.91.1!

      news.movim.eu / PlanetGnome · 2 days ago - 15:31 · 1 minute

    I am please to announce a new development version of Cambalache

    This comes with two major dependencies changes, the first one is a very basic port to Adwaita which fixes dark mode support with Gtk4 The biggest one is that I have replaced the WebKit WebView used to show widgets in the workspace for a custom Wayland compositor widget based on wlroots.

    So far, this is how Cambalache showed windows from a different process in its workspace. Workspace diagram using Bradwayd and WebKit WebView

    It would run broadwayd or gtk4-broadwayd backend depending on the gtk version of your project and use a WebView to show all the windows.

    With the new approach we do not need the extra broadway backend and also we do not need to run a whole web browser just to show a window. On top of that we get all the obvious optimizations from using Wayland instead of a protocol meant to go over the internet.

    For example, with broadway, the client would render the window in memory, the broadway backend would compress the image and sent it over TCP to the webview which has to uncompress it and render it on an HTML5 canvas using JS api.

    But now, the client just renders it in shared memory which the compositor use it to create a cairo surface from it and renders it directly in a GtkDrawingArea.

    And this is how the new Cambalache looks like when editing its own UI. This also leaves room for improvement by leveraging all the new Gtk infrastructure for graphics offloading ( see this blog post from Matthias about it )

    As usual with so many changes I expect new bugs so please if you find any,  file them here .

    Special thanks goes to emersion, kennylevinsen, vyivel and the wlroots community for their support and awesome project, I would not have been able to do this without wlroots and their help.

    Where to get it?

    You can get it from Flathub Beta

    flatpak remote-add --if-not-exists flathub-beta flathub.org/beta-repo/flathub-
    
    flatpak install flathub-beta ar.xjuan.Cambalache

    or checkout main branch at gitlab

    git clone https://gitlab.gnome.org/jpu/cambalache.git

    Matrix channel

    Have any question? come chat with us at #cambalache:gnome.org

    Mastodon

    Follow me in Mastodon @xjuan to get news related to Cambalache development.

    Happy coding!

    • wifi_tethering open_in_new

      This post is public

      blogs.gnome.org /xjuan/2024/06/21/new-cambalache-development-release-0-91-1/

    • chevron_right

      Michael Meeks: 2024-06-21 Friday

      news.movim.eu / PlanetGnome · 2 days ago - 14:42

    • Up earlyish; mail chew, call with Aron & Anna.
    • Worked on slides for our first public Tea Time Training; hopefully we can get more people discussing LibreOfficeKit & COOL development weekly & interactively.
    • Lunch; J. drove us to Durham to meet up with H, finally a lovely sunny day; worked in the car.
    • wifi_tethering open_in_new

      This post is public

      meeksfamily.uk /~michael/blog/2024-06-21.html

    • chevron_right

      Jussi Pakkanen: Advanced text features and PDF

      news.movim.eu / PlanetGnome · 2 days ago - 11:49 · 5 minutes

    The basic text model of PDF is quite nice. On the other hand its basic design was a very late 80s "ASCII is everything everyone really needs, but we'll be super generous and provide up to 255 glyphs using a custom encoding that is not in use everywhere else". As you can probably guess, this causes a fair bit of problems in the modern world.

    To properly understand the text that follows you should know that there are four different ways in which text and letters need to be represented to get things working:

    • Source text is the "original" written text in UTF-8 (typically)
    • Unicode codepoints represent unique glyph IDs as specified by the Unicode standard
    • A glyph id uniquely specifies a glyph (basically a series of drawing operations), these are arbitrary and typically unique for each font
    • ActualText is sort of like an AltText for PDF but uses UTF-16BE as was the way of the future in the early 90s

    Kerning

    The most common advanced typography feature in use is probably kerning, that is, custom spacing between certain letter pairs like "AV" and "To". The PDF text model has native support for kerning and it even supports vertical and horizontal kerning. Unfortunately the way things are set up means that you can only specify horizontal kerning when laying out horizontal text and vertical kerning for vertical text. If your script requires both, you are not going to have a good time.

    There are several approaches one can take. The simplest is to convert all text to path drawing operations, which can be placed anywhere with arbitrary precision. This works great for printed documents but also means that document sizes balloon and you can't copypaste text from the document, use screen readers or do any other operation that needs the actual text those shapes represent.

    An alternative is to render each glyph as its own text object with exact coordinates. While verbose this works, but since every letter is separate, text selection becomes wonky again. PDF readers seem to have custom heuristics to try to detect these issues and fix text selection in post-processing. Sometimes it works better than at other times.

    Everything in PDF drawing operations is based on matrices. Text has its own transform matrix that defines where the next glyph will go. We could specify kerning manually with a custom translation matrix that translates the rendering location by the amount needed. There are two main downsides to this. First of all it would mean that instead of having a stream of glyphs to render, you'd need to define 9 floating point numbers (actually 6 due to reasons) between every pair of glyphs. This would increase the size of you output by a factor of roughly ten. The other downside is that unlike for all other matrices, PDF does not permit you to multiply an existing text state matrix with a new one. You can only replace it completely. So the actual code path would become "tell PDF to draw a glyph, work out what changes it would make to the currently active text matrix, undo that, multiply that matrix with one that has the changes that you wanted to happen and proceed to the next glyph".

    Glyph substitution

    Most of the time (in most scripts anyway) source text's Unicode codepoints get mapped 1:1 to a font glyph in the final output. Perhaps the most common case where this does not happen is ligatures.

    The actual rules when and how this happens are script, font and language dependent. This is something you do not want to do yourself, instead use a shaping engine like Harfbuzz . If you give it the source text as UTF-8 and a font that has the ffi ligature, it will return a list of four glyph ids in the font to use, the way they map back to the original text, kerning (if any) and all of that good stuff.

    What it won't give you is the information of what ligatures it replaced your source text with. In this example it will tell you the glyph id of the ffi ligature (2132) but not which Unicode codepoint it corresponds to ( 0xFB03 ). You need to tell that number in PDF metadata for the text to work properly in copypaste operations. At first this does not seem like such a big problem, because we have access to the original font file and Freetype . You'd think you can just ask Freetype for the Unicode codepoint for a given font glyph, but you can't. There is a function for finding a glyph for a given Unicode codepoint but mot the other way around. The stackoverflow recommended way of doing this is to iterate over all glyphs until you find the one that is mapped to the desired codepoint. For extra challenge you need to write an ActualText tag in the PDF command stream so that when users copypaste that text they get the original form with each individual letter rather than the ffi Unicode glyph.

    All of this means that glyph lookup is basically a O(n^2) operation if it was possible to do. Sometimes it isn't, as we shall now find out.

    Alternate forms

    OpenType fonts can have multiple different glyphs for the same Unicode codepoint, for example the small caps versions of Noto Serif look like this.

    These are proper hand-drawn versions of the glyphs, not ones obtained by scaling down upper case letters. Using these is simple, you tell Harfbuzz to use the small caps versions when shaping and then it does everything for you. For this particular font upper case small caps glyphs are the same as regular upper case glyphs. The lower case ones have their own glyphs in the font. However, according to Freetype at least, those glyphs are not mapped to any Unicode codepoint. Conceptually a small caps lower case "m" should be the same as a regular lower case "m". For some reason it is not and, unless I'm missing something, there is no API that can tell you that. The only way to do it "properly" is to track this yourself based on your input text and requirements.

    How does CapyPDF handle all this?

    In the same way pretty much all PDF generator libraries do: by ignoring all of it. CapyPDF only provides the means to express all underlying functionality in the PDF library. It is the responsibility of the client application to form glyph sequences and related PDF metadata in the way that makes sense for their application and document structure.

    • wifi_tethering open_in_new

      This post is public

      nibblestew.blogspot.com /2024/06/advanced-text-features-and-pdf.html

    • chevron_right

      Sudhanshu Tiwari

      news.movim.eu / PlanetGnome · 3 days ago - 12:38 · 3 minutes

    GSoC 2024: Week 1-2 Report


    Project

    Add support for the latest GIR attributes and GI-Docgen formatting to Valadoc.

    Mentor

    Lorenz Wildberg

    Project Planning

    In the Phase I of this project, our focus is on adding support for latest GObject Introspection attributes to vapigen and the vala compiler. Currently we are adding support for the glib:sync-func , glib:async-func , and glib:finish-func attribute for method, and the default-value attribute for property. To accomplish this, we need to understand how the GirParser builds the AST from the GIR data, and how the GirWriter writes into the GIR file using the AST. The latter is a bit easy, because the GirWriter simply visits all code nodes and prints each one of them into the GIR file. However, we found the first one to be pretty challenging, about which we will discuss in a bit!

    Week 1: Support for glib:sync-func , glib:async-func , and glib:finish-func attributes

    Firstly we went through vala GIR files to decide how to implement support for these attributes:

    The glib:finish-func attribute for method

    Then we decided to begin by adding support for the glib:sync-func, glib:async-func and glib:finish-func attributes:

    • glib:sync-func : denotes the synchronous version of the asynchronous function.
    • glib:async-func : denotes the asynchronous version of a synchronous function.
    • glib:finish-func : denotes the finish function of the asynchronous function.

    In the GirParser: It is clear that the property coroutine of Vala.Method is true for both async-func and finish-func in the AST. This is because there is only an asynchronous function, but no finish function in Vala. We have implemented support for glib:finish-func by using this attribute to identify whether a node in the AST created by the GirParser is an async-func. With the introduction of this attribute we no longer need to guess the finish function corresponding to an async-func by appending "_finish" to the name of the async-func. We can easily determine the finish function from the glib:finish-func attribute using:

    m . finish_func = girdata [ "glib:finish-func" ];

    In the GirWriter: To write these attributes onto the GIR files, we made the GirWriter to:
    write the glib:async-func attribute if the method is a sync-func and the corresponding asynchronous function exists. write the sync-func attribute if the method is an async-func.
    We also added a new property to Vala.Method - the property finish_func which, in addition to telling us the finish-func of the method, also tells if the method is an async-func ( finish_func would be null if the method were not an async-func), thus making it possible to differentiate between async-func and finish-func in the GIR:

    Week 2: Support for default-value attribute for property

    As the name suggests, the default-value attribute gives the default value of a property.

    The default-value attribute for property

    It is worth mentioning that support for this attribute was already present in the GirWriter. So we only needed to implement support in the GirParser, and also ensure that the default value is outputted out in the generated vapi files by adding support to the CodeWriter. We used the default-value attribute to set the initializer property of Vala.Property. We then used initializer to output the default value onto the generated vapi files. This was a challenge to implement because the initializer is a V ala.Expression but t he default-value that we get from the girdata is a string . So we need to convert the expression to a string in CodeWriter and parse the string into an expression in the GirWriter. We are trying to accomplish by writing a function value_string_to_literal_expression in the GirParser:

    Apart from working on these two attributes and adding test cases for them, we also worked on the merge request: !374 which generates properties from getters and setters of compact classes in bindings. It needed some minor fixes and we are now closer to getting it merged.

    Challenges

    Now let's talk about the challenges, because after all challenges are what make a project fun! As part of this project, we are working to add support for the latest GObject Introspection attributes in the GirParser. The GirParser works by building an AST of different code nodes in the code context. This involves parsing different symbols in the source file, and because there are so many symbols, the GirParser is dependent on over a hundred different files in the directory vala/vala , many of them with over thousands of lines of code. We need to understand how all this is put together and results in the GirParser being able to parse GIR files. Currently we are trying to use a structured approach to code reading so that we can implement support for these attributes within the time we have to complete this project!

    Thanks for reading :) Stay tuned for more updates!
    • wifi_tethering open_in_new

      This post is public

      sudhanshut.blogspot.com /2024/06/gsoc-2024-week-1-2-report-project-add.html

    • chevron_right

      Udo Ijibike: Outreachy Internship Blog Series: My Self Introduction

      news.movim.eu / PlanetGnome · 5 days ago - 20:00 · 2 minutes

    Hi! I’m Udo Ijibike, and I’m currently participating in a GNOME UX research project this summer as an Outreachy intern. My mentors are Allan Day and Aryan Kaushik. This is the first in a series of blog posts chronicling my internship.

    About Me

    I’m a passionate User Experience (UX) Designer from Nigeria. I love technology, and despite my degree in engineering, I’ve always had an appreciation for the arts and a keen interest in psychology. These seemingly disparate interests ultimately converge through UX Design in a way that’s incredibly fulfilling for me.

    My core values include self-awareness, progress, and curiosity. Self-awareness helps me set meaningful goals, focusing on progress keeps me grounded enough to enjoy the process, and curiosity keeps me open to new ideas.

    Why I Applied to Outreachy

    Outreachy is a platform that advocates for everyone’s potential to contribute meaningfully to science and technology, and its goal validates my aspirations as a UX Designer. I discovered Outreachy four months into my pivot to UX Design. This early in my transition, doubts were frequent, but learning about Outreachy strengthened my resolve to persevere.

    It took a few rounds of deliberation before I mustered the courage to apply. I feel honored to have been selected as an intern on my second attempt.

    Why GNOME

    The internship application had two stages: the initial application and the contribution period. Getting through the initial application was significant, but it was just the first step. However, I saw the opportunity to contribute to an impactful project and work with a mentor for an entire month as invaluable for learning. Therefore, I approached the contribution phase resolved to find a project where I could contribute and improve my UX Design skills in a meaningful way.

    I was drawn to GNOME because of its design-driven approach to solving complex problems in Free and Open Source Software. Its commitment to user-friendliness and inclusivity in FOSS made an impression on me, and the internship project itself was an ideal match for my objectives.

    Internship Project

    My internship project focuses on understanding user needs through UX research to enhance the usability of GNOME software.

    Over the next few months, we’ll be conducting user interviews, surveys, and user tests to derive actionable insights for improving select software within the GNOME ecosystem.

    Looking Forward

    I’m thrilled to be learning so much from contributing to a project that I’m excited about in such a welcoming community. As my internship continues, I’ll continue to share more about my Outreachy experience and our project in subsequent posts.

    Thank you for reading!

    • wifi_tethering open_in_new

      This post is public

      blogs.gnome.org /udoijiibike/2024/06/18/outreachy-internship-blog-series-my-self-introduction/

    • chevron_right

      GNOME Accessibility Blog: Update on Newton, the Wayland-native accessibility project

      news.movim.eu / PlanetGnome · 5 days ago - 17:46 · 13 minutes

    Several months ago, I announced that I would be developing a new accessibility architecture for modern free desktops . Now, I’m happy to provide an update on this project, code-named Newton. Before I begin, I’d like to thank the Sovereign Tech Fund for funding this work, and the GNOME Foundation for managing the contract.

    A word on the name

    When choosing a working name for this project, I decided to follow the convention established by Wayland itself, and followed by a couple of other projects including the Weston compositor, of naming Wayland and related projects after places in New England. Newton, Massachusetts is the town where the Carroll Center for the Blind is located.

    Demo

    Here’s a demo of me using a modified GNOME OS image with a couple of GTK 4 apps running inside Flatpak sandboxes without the usual accessibility exception.

    Builds for testing

    The following builds are based on GNOME 46.2 with my current unmerged Newton modifications. The corresponding Git branches are linked below.

    I’ve also built a Flatpak repository, but it isn’t currently signed, so it doesn’t have a .flatpakrepo file. You can add it manually with this command:

    flatpak remote-add --user --no-gpg-verify newton https://mwcampbell.us/gnome/gnome-46-newton/repo/

    Because the Flatpak repository is based on GNOME 46, you can use Flatpak apps that were built for GNOME 46 with the Newton version of the org.gnome.Platform runtime. You can install that runtime with this command:

    flatpak install newton org.gnome.Platform

    Source repositories

    Here are the links to the unmerged Newton branches of the individual components:

    Here are my branches of the Buildstream metadata repositories, used to build the GNOME OS image and Flatpak runtime:

    freedesktop-sdk gnome-build-meta

    Only this last repository needs to be checked out directly. With it, one should be able to reproduce my builds.

    If you want to do your own builds of the relevant components, my addition to the Orca README has instructions. The Orca GitLab project linked above is also a good place to provide end-user feedback.

    What’s working so far

    I’ve now implemented enough of the new architecture that Orca is basically usable on Wayland with some real GTK 4 apps, including Nautilus, Text Editor, Podcasts, and the Fractal client for Matrix. Orca keyboard commands and keyboard learn mode work, with either Caps Lock or Insert as the Orca modifier. Mouse review also works more or less. Flat review is also working. The Orca command to left-click the current flat review item works for standard GTK 4 widgets.

    As shown in the recorded demo above, Newton-enabled applications can run inside a Flatpak sandbox without the usual exception for the AT-SPI bus, that is, with the --no-a11y-bus option to flatpak run . Support for such sandboxing was one of the major goals of this project.

    The standard GTK text widgets, including GtkEntry and GtkTextView have fairly complete support. In particular, when doing a Say All command, the caret moves as expected. I was also careful to support the full range of Unicode, including emoji with combining characters such as skin tone modifiers.

    What’s broken or not done yet

    The GNOME Shell UI itself is not yet using Newton, but AT-SPI. The UI is still accessible with the Newton versions of Mutter and Orca, but it’s being accessed via AT-SPI, meaning that performance in this UI is not representative of Newton, and mouse review doesn’t work for this UI.

    Synthesizing mouse events isn’t yet supported on Wayland. This means that while the Orca command for left-clicking the current flat review item is expected to work for standard GTK 4 widgets, that command doesn’t work for widgets that don’t support the appropriate accessible action, and the right-click command doesn’t work.

    AccessKit doesn’t currently support sentences as text boundaries. This means that Orca’s Say All command falls back to reading by line, leading to unnatural breaks in the speech.

    The GTK AccessKit implementation doesn’t yet support out-of-tree text widgets that implement the GtkAccessibleText interface, such as the GTK 4 version of the vte terminal widget. This means that GTK 4-based terminal apps like kgx don’t yet work with Newton. I don’t yet know how I’ll solve this, as the current GtkAccessibleText interface is not a good fit for the push-based approach of AccessKit and Newton.

    Text attributes such as font, size, style, and color aren’t yet exposed. AccessKit has properties for these attributes, but the AccessKit AT-SPI backend, much of which is reused by the Newton AT-SPI compatibility library, doesn’t yet support them.

    Tables aren’t yet supported. AccessKit has properties for tables, and the GTK AccessKit backend is setting these properties, but the AccessKit AT-SPI backend doesn’t yet expose these properties.

    Some states, such as “expanded”, “has popup”, and “autocomplete available”, aren’t yet exposed.

    I’m aware that some GTK widgets don’t have the correct roles yet.

    When Caps Lock is set as the Orca modifier, you can’t yet toggle the state of Caps Lock itself by pressing it twice quickly.

    Orca is the only assistive technology supported so far. In particular, assistive technologies that are implemented inside GNOME Shell, like the screen magnifier, aren’t yet supported.

    Bonus: Accessible GTK apps on other platforms

    Because we decided to implement Newton support in GTK by integrating AccessKit, this also means that, at long last, GTK 4 apps will be accessible on Windows and macOS as well. The GTK AccessKit implementation is already working on Windows, and it shouldn’t be much work to bring it up on macOS. To build and test on Windows, check out the GTK branch I linked above and follow the instructions in its README. I’ve built and tested this GTK branch with both Visual Studio (using Meson and the command-line Visual C++ tools) and MSYS 2. I found that the latter was necessary for testing real-world apps like gnome-text-editor.

    Architecture overview

    Toolkits, including GTK, push accessibility tree updates through the new accessibility Wayland protocol in the wayland-protocols repository linked above. The latest accessibility tree update is part of the surface’s committed state, so the accessibility tree update is synchronized with the corresponding visual frame. The toolkit is notified when any accessibility clients are interested in receiving updates for a given surface, and when they want to stop receiving updates, so the toolkit can activate and deactivate its accessibility implementation as needed. This way, accessibility only consumes CPU time and memory when it’s actually being used. The draft Wayland protocol definition includes documentation with more details.

    Assistive technologies or other accessibility clients currently connect to the compositor through a D-Bus protocol, defined in the Mutter repository linked above. By exposing this interface via D-Bus rather than Wayland, we make it easy to withhold this communication channel from sandboxed applications, which shouldn’t have this level of access. Currently, an assistive technology can find out about a surface when it receives keyboard focus or when the pointer moves inside it, and can then ask to start receiving accessibility tree updates for that surface.

    The same D-Bus interface also provides an experimental method of receiving keyboard events and intercepting (“grabbing”) certain keys. This is essential functionality for a screen reader such as Orca. We had originally planned to implement a Wayland solution for this requirement separately, but decided to prototype a solution as part of the Newton project to unblock realistic testing and demonstration with Orca. We don’t yet know how much of this design for keyboard event handling will make it to production.

    The compositor doesn’t process accessibility tree updates; it only passes them through from applications to ATs. This is done using file descriptor passing. Currently, the file descriptors are expected to be pipes, but I’ve thought about using shared memory instead. The latter would allow the AT to read the accessibility tree update without having to block on I/O; this could be useful for ATs that run inside Mutter itself, such as the screen magnifier. (The current Newton prototype doesn’t yet work with such ATs.) I don’t know which approach is overall better for performance though, especially when one takes security into account.

    The serialization format for accessibility tree updates is currently JSON, but I’m open to alternatives. Obviously we need to come to a decision on this before this work can go upstream. The JSON schema isn’t yet documented; so far, all code that serializes and deserializes these tree updates is using AccessKit’s serialization implementation.

    In addition to tree updates, this architecture also includes one other form of serialized data: accessibility action requests. These are passed in the reverse direction, from the AT to the application via the compositor, again using file descriptor passing. Supported actions include moving the keyboard focus, clicking a widget, setting the text selection or caret position, and setting the value of a slider or similar widget. The notes about serialization of tree updates above also apply to action requests.

    Note that the compositor is the final authority on which tree updates are sent to the ATs at what time, as well as on which surface has the focus. This is in contrast with AT-SPI, where ATs receive tree updates directly from applications, and any application can claim to have the focus at any time. This is important for security, especially for sandboxed applications.

    Open architectural issues

    The biggest unresolved issue at this point is whether the push-based approach of Newton, the motivation for which I described in the previous post, will have unsolvable drawbacks, e.g. for large text documents. The current AccessKit implementation for GtkTextView pushes the full content of the document, with complete text layout information. On my brand new desktop computer, this has good performance even when reading an 800 KB ebook, but of course, there are bigger documents, and that’s a very fast computer. We will likely want to explore ways of incrementally pushing parts of the document based on what’s visible, adding and removing paragraphs as they go in and out of view. The challenge is to do this without breaking screen reader functionality that people have come to depend on, such as Orca’s Say All command. My best idea about how to handle this didn’t occur to me until after I had finished the current implementation. Anyway, we should start testing the current, naive implementation and see how far it takes us.

    The current AT protocol mentioned above doesn’t provide a way for ATs to explore all accessible surfaces on the desktop; they can only find out about an accessible surface if it receives keyboard focus or if the pointer moves inside it. A solution to this problem may be necessary for ATs other than Orca, or for automated testing tools which currently use AT-SPI.

    The current architecture assumes that each Wayland surface has a single accessibility tree. There isn’t yet an equivalent to AT-SPI’s plugs and sockets, to allow externally generated subtrees to be plugged into the surface’s main tree. Something like this may be necessary for web rendering engines.

    I’m not yet sure how I’ll implement Newton support in the UI of GNOME Shell itself. That UI runs inside the same process as the compositor, and isn’t implemented as Wayland surfaces but as Clutter actors (the Wayland surfaces themselves map to Clutter actors). So the existing AccessKit Newton backend won’t work for this UI as it did for GTK. One option would be for Mutter to directly generate serialized tree updates without going through AccessKit. That would require us to finalize the choice of serialization format sooner than we otherwise might. While not as convenient as using the AccessKit C API as I did in GTK, that might be the least difficult option overall.

    Newton doesn’t expose screen coordinates, for individual accessible nodes or for the surfaces themselves. ATs are notified when the pointer moves, but the compositor only gives them the accessible surface ID that the pointer is inside, and the coordinates within that surface. I don’t yet have a solution for explore-by-touch, alternative input methods like eye-tracking, or ATs that want to draw overlays on top of accessible objects (e.g. a visual highlight for the screen reader cursor).

    Next steps

    The earlier section on what’s broken or not done yet includes several issues that should be straightforward to fix. I’ll fix as many of these as possible in the next several days.

    But the next major milestone is to get my GTK AccessKit integration reviewed and merged. Since Newton itself isn’t yet ready to go upstream, the immediate benefit of merging GTK AccessKit support would be accessibility on Windows and macOS. The current branch, which uses the prototype Newton backend for AccessKit, can’t be merged, but it wouldn’t be difficult to optionally support AccessKit’s AT-SPI backend instead, while keeping the Newton version on an unmerged branch.

    The main challenge I need to overcome before submitting the GTK AccessKit integration for review is that the current build system for the AccessKit C bindings is not friendly to distribution packagers. In particular, one currently has to have rustup and a Rust nightly toolchain installed in order to generate the C header file, and there isn’t yet support for installing the header file, library, and CMake configuration files in FHS-compliant locations. Also, that build process should ideally produce a pkg-config configuration file. My current gnome-build-meta branch has fairly ugly workarounds for these issues, including a pre-generated C header file checked into the repository. My current plan for solving the nightly Rust requirement is to commit the generated header file to the AccessKit repository. I don’t yet know how I’ll solve the other issues; I might switch from CMake to Meson.

    The other major thing I need to work on soon is documentation. My current contract with the GNOME Foundation is ending soon, and we need to make sure that my current work is documented well enough that someone else can continue it if needed. This blog post itself is a step in that direction.

    Help wanted: testing and measuring performance

    I have not yet systematically tested and measured the performance of the Newton stack. To be honest, measuring performance isn’t something that I’m particularly good at. So I ask that Orca users try out the Newton stack in scenarios that are likely to pose performance problems, such as large documents as discussed above. Then, when scenarios that lead to poor performance are identified, it would be useful to have someone who is skilled with a profiler or similar tools help me investigate where the bottlenecks actually are.

    Other desktop environments

    While my work on Newton so far has been focused on GNOME, I’m open to working with other desktop environments as well. I realize that the decision to use D-Bus for the AT client protocol won’t be universally liked; I suspect that wlroots-based compositor developers in particular would rather implement a Wayland protocol extension. Personally, I see the advantages and disadvantages of both approaches, and am not strongly attached to either. One solution I’m considering is to define both D-Bus and Wayland protocols for the interface between the compositor and ATs, and support both protocols in the low-level Newton client library, so each compositor can implement whichever one it prefers. Anyway, I’m open to feedback from developers of other desktop environments and compositors.

    Conclusion

    While the Newton project is far from done, I hope that the demo, builds, and status update have provided a glimpse of its potential to solve long-standing problems with free desktop accessibility, particularly as the free desktop world continues to move toward Wayland and sandboxing technologies like Flatpak. We look forward to testing and feedback from the community as we keep working to advance the state of the art in free desktop accessibility.

    Thanks again to the Sovereign Tech Fund and the GNOME Foundation for making this work possible.

    • chevron_right

      Sam Thursfield: Status update, 2024-06-18

      news.movim.eu / PlanetGnome · 5 days ago - 15:59 · 4 minutes

    Podcasts

    If you’re into podcasts, the Blindboy Podcast is possibly the best one. Recent episode The State of the World begins on a pointy rock off the coast of Ireland that houses a 6th century monastery.

    Photo of Skellig Michael Skellig Michael – Photo credit – Tristan Reville

    Then he raises the question of why, when the Russian government & the Russian military commit war crimes, US & European leaders apply punishing economic sanctions to Russia, and when the government and the military of Israel commit atrocities and war crimes, US & European leaders ask them nicely if they could please stop at some point, or at least commit slightly fewer of them.

    On that note, I’d like to shout out the Bands Boycott Barclays movement in the UK — our politicians have failed, but at least our musicians aren’t afraid to follow the money and stand up for human rights.

    Heisenbugs and spaghetti code

    In computer programming jargon, a heisenbug is a software bug that seems to disappear or alter its behavior when one attempts to study it. ( Wikipedia )

    In the year 2018 I was passing time in Barcelona waiting to start a new job (and a new life, as it turned out). It was the middle of July and everyone had left for the summer. Meanwhile GNOME had recently gained a Gitlab instance and for the first time we could run automated test pipelines in Gitlab CI, so I set up initial CI pipelines for Tracker and Tracker Miners. (Which, by the way, are undergoing a rename ).

    Tracker Miners already has a testsuite, but in 2018 it was only run when someone decided to run it locally. Remember that our filesystem indexer is implemented as many GIO async callbacks, which means it is complex spaghetti code. These callbacks can fire in a different order depending on various factors. Tracker responds to filesystem notifications and disk reads, which are unpredictable. And of course, it has bugs — in 2018 nobody much was funding maintenance Tracker or Tracker Miners, and many of these would trigger, or not, depending on the execution order of the internal async callbacks.

    So, many of the tests would just randomly fail, making the new CI rather useless.

    Thanks to a lot of hard work by Carlos and myself, we documented and fixed these issues and the indexer is in a much better state. It was not much fun! I would rather be eating spaghetti than tracing through spaghetti code. I guess somewhere I made bad life choices.

    Bowl of spaghetti

    In 2021, somehow not learning my lesson, I adopted the nascent openQA tests for GNOME OS that were developed by one of my Codethink colleagues. These have been running for 3 years now at a “beta” quality service, and have caught some interesting bugs along the way .

    In late 2023 we noticed a seemingly random test failure, where the machine under test makes it through initial setup, but didn’t get to a user session. Let’s call this issue 62 .

    This issue reproduces a lot, when you don’t want it to, and rarely if ever when you do. Here’s a recent example . See how the initial setup stage (gnome_welcome) passes, but the transfer to the final user session never happens:

    openQA test result with gnome_desktop test failing

    Random test failures make a test framework completely useless – if “test failed” can mean “your code is fine but fate is not on your side today”, then it’s not a useful signal for anyone. So for the GNOME OS automated testing to progress, we need to fix this intermittent failure.

    GNOME OS runs many different processes during startup, which can start in whatever order the systemd daemon decides to start them, and can then start child processes of their own. No two boots are exactly the same. The graphical startup is driven by systemd, GDM, gnome-session, gnome-initial-setup and GNOME Shell, context switching between themselves hundreds of times following D-Bus message traffic. You could say that this is an extra large portion of spaghetti, with toxic meatballs.

    As part of the STF engagement with GNOME OS, Neill Whillans spent a considerable amount of time comparing logs from good and bad boots, investigating the various processes that run at start-up time, and trying to catch the boot failure locally. I’ve tried my best to help as time permits (which it mostly does not).

    Fundamentally, the issue is that there are two gnome-shell processes running, one for the initial-setup user and one for the new ‘testuser’ which should take over the session. GDM is supposed to kill the initial-setup shell when the new ‘testuser’ shell begins. This doesn’t happen.

    We have a theory that its something failing during the startup of the new gnome-shell . The layout mechanism looks very complex, so its entirely possible that there is a heisenbug in there somewhere. As you would expect, enabling debug logs for GNOME Shell causes the issue to go away completely.

    There’s a lesson here, which is that excessive complexity kills motivation, and ultimately kills projects. We could have done quite a lot in the last 7 months if not for this issue. Let’s all try to keep our code simple, debuggable and documented. We have succeeded in the past at killing overly complex abstractions in GNOME — remember CORBA? Perhaps we need to do so again.

    What happens next? I am not sure – I think we may just have to give up on the end-to-end tests for the time being, as we can’t justify spending more of the current budget on this, and I’ve done enough volunteering for the time being – I plan to spend the summer evenings in a hammock, far away from any GNOME code, perhaps eating spaghetti.

    • chevron_right

      Debarshi Ray: Toolbx now enables the proprietary NVIDIA driver

      news.movim.eu / PlanetGnome · 6 days ago - 18:50 · 2 minutes

    … and why did it take so long for that to happen?

    If you build Toolbx from Git and install it to your usual prefix on a host operating system with the proprietary NVIDIA driver, then you will be able to use the driver on all freshly started Toolbx containers. Just like that. There’s no need to recreate your containers or to use some special option. It will just work.

    How does it work?

    Toolbx uses the NVIDIA Container Toolkit to generate a Container Device Interface specification on the host during the toolbox enter and toolbox run commands. This is a JSON or YAML description of the environment variables, files and other miscellaneous things required by the user space part of the proprietary NVIDIA driver. Containers share the kernel space driver with the host, so we don’t have to worry about that. This specification is then shared with the Toolbx container’s entry point, which is the toolbox init-container command running inside the container. The entry point handles the hooks and bind mounts, while the environment variables are handled by the podman exec process running on the host.

    It’s worth pointing out that right now this neither uses nvidia-ctk cdi generate to generate the Container Device Interface specification nor podman create --device to consume it. We may decide to change this in the future, but right now this is the way it is.

    The main problem with podman create is that the specification must be saved in /etc/cdi or /var/run/cdi , both of which require root access, for it to be visible to podman create --device . Toolbx containers are often used rootless, so requiring root privileges for hardware support, something that’s not necessary on the host, will be a problem.

    Secondly, updating the toolbox(1) binary won’t enable the proprietary NVIDIA driver in existing containers, because podman create only affects new containers.

    Therefore, Toolbx uses the tags.cncf.io/container-device-interface Go APIs, which are also used by podman create , to parse and apply the specification itself. The hooks in the specification are a bit awkward to deal with. So, at the moment only ldconfig(8) is supported.

    The issue with nvidia-ctk is relatively minor and is because it’s another different binary. It makes error handling more difficult, and downstream distributors and users of Toolbx need to be aware of the dependency. Instead, it’s better to directly use the github.com/NVIDIA/go-nvlib and github.com/NVIDIA/nvidia-container-toolkit Go APIs that nvidia-ctk also uses. This offers all the usual error handling facilities in Go and ensures that the dependency won’t go missing.

    Why did it take so long?

    Well, hardware support needs hardware, and sometimes it takes time to get access to it. I didn’t want to optimistically throw together a soup of find(1) , grep(1) , sed(1) , etc. calls without any testing in the hope that it will all work out. That approach may be fine for some projects, but not for Toolbx.

    Red Hat recently got me a ThinkPad P72 laptop with a NVIDIA Quadro P600 GPU that let me proceed with this work.

    • wifi_tethering open_in_new

      This post is public

      debarshiray.wordpress.com /2024/06/17/toolbx-now-enables-the-proprietary-nvidia-driver/