close
    • chevron_right

      Jan Lukas Gernert: NewsFlash 3.0

      news.movim.eu / PlanetGnome · 10:28 · 5 minutes

    The next version of NewsFlash is ready. And it comes packed with so much new features and speed improvements + a new look, that the jump to version 3 is more than justified.

    Visual comparison to version 2.3

    The most obvious difference is the use of the libadwaita 1.4 split views and toolbar views. The result are 3 columns slightly different in color. Each with boarder-less headerbars.

    An awesome perk of using these widgets is nicer sizing behavior of the sidebar and the article list. With NewsFlash 2.3 these two columns stayed the exact same size at all times. All extra space was given to the article. Now additional space is distributed evenly

    old fullscreen layout new fullscreen layout

    The article list features the new listview sections added in Gtk 4.12. Articles are grouped by day and the corresponding date is displayed as the section header.

    Drag & Drop is back

    And better than ever before.

    The feature didn’t survive the transition from Gtk3 to Gtk4. Back in the NewsFlash 1.x.x days drag & drop was only usable to move a feed from one category to another. The position of elements in the sidebar would seemingly change at random after the drop happened.

    A lot of work went into improving the experience of drag & drop. NewsFlash doesn’t try to keep the positions of items in sync with the backed service. But instead focuses on keeping things consistent locally. Items now move to the exact spot they were dragged to.

    Subscribing to feeds works better on mobile

    Clicking the plus icon used to spawn a popover that guided you through the process of adding either feeds, categories or tags.

    Unfortunately this didn’t really work on mobile. Interacting with the on-screen-keyboard would close the popover making it impossible to complete the process.

    So instead the popover now lets you spawn one of 3 dialogs.

    Not only does this change make it possible to add feeds on mobile & other touch devices. But I would argue it is an improvement for desktop as well. No longer can the popover accidentally be closed by switching to another window.

    The wizard guiding you through the process also got a facelift.

    Speed

    NewsFlash 3.0 should be a lot faster.

    I went all-in on the tokio runtime and its ecosystem. No more spawning threads manually for a single operation. No more blocking locks. Everything is async and runs on a single tokio runtime which is way smarter about task managing than I ever could be.

    At the same time the reqwest client is now reused as much as possible and only rebuild once relevant settings change.

    Together these changes mean operations like syncing, marking articles as read and scraping full article content should feel more responsive.

    The favicon cache is now based on moka which eliminates the case of starting multiple downloads of the same icon at the same time. Large favicons get scaled down to a reasonable size to improve loading times.

    Remember the window state

    One of the more often requested features: remember the window state when NewsFlash quits and restore everything on the next launch.

    This can mean different things to different people: Some want the app to remember the window size, others if it displays all or only unread articles.

    Here is a list of the things that are saved and restored:

    • window size (and maximized)
    • sidebar selection
    • article list mode (all, unread, starred)
    • search term
    • selected article
    • if original or scraped content is shown
    • article view zoom

    Other things like exact scroll positions of lists are not saved.

    View large images

    Large images in the article can be clicked and open the image up in a new window. No fancy gestures or anything. But still a welcome feature.

    The same dialog is used to display image attachments of articles.

    More Thumbnails

    Articles get more visual appealing with NewsFlash 3.0!

    Thumbnails were supported before. But only if the article had an image attachment. And then only if the back-end supported attachments via their API (e.g. miniflux did not).

    Going forward there is a heuristic to gather a “relevant” image from the article HTML and used that as a thumbnail.

    Edit dialogs

    The simple “rename” dialogs of old evolved into “edit” dialogs. The biggest change is the dialog for feeds. The feed can still be renamed, the feed URL is at least visible now (changing is not possible just yet) and it can be moved to another category from here as well (alternative to drag & drop).

    Some math focused blogs use “$” as start and end marker for formulas. Always rendering text between “$” with mathjax lead to malformed articles that contained multiple dollar signs in their regular text. This can now be controlled on a feed-by-feed basis.

    An idea for the future is to allow a feed-by-feed setting if an attempt to download the full article content right after a sync should be made.

    Commafeed

    Commafeed is the newest addition to the list of supported services. Its an open source web based feed reader.

    The implementation is not as battle tested as the others. So if you hit a roadblock with it please let me know.

    More

    Above only the more significant features and changes were listed. There are a lot of smaller features and bug fixes. So if you previously had trouble with anything it may be worth to try NewsFlash again.

    Features that did not make it: Video player

    You can not view video attachments or youtube videos in a new embedded player. Video streaming is hard. GStreamer does a lot of the work for you. But there are still a lot of gaps that an application has to fill to make a nice streaming player.

    I was encountering image freezes, stuck videos and outright gstreamer crashes from time to time. Not to mention missing features like switching video streams on the fly, which I left out since it added even more problems to the table.

    So its back to the drawing board. But here is how it could have landed in NewsFlash 3.0:

    • chevron_right

      Sebastian Wick: On the Usefulness of SO_PEERPIDFD

      news.movim.eu / PlanetGnome · Yesterday - 16:11 · 2 minutes

    Kernel 6.5 added a few new pidfd functions: SCM_PIDFD and SO_PEERPIDFD . The idea behind them is the same as SCM_CREDENTIALS and SO_PEERCRED respectively. The only difference is that the PIDFD functions return not a plain, numerical PID but a file descriptor instead.

    A plain PID is small number of type pid_t that is incremented for each new process and wraps over when too many processes have been created. This PID is usually used to look up some information about the process via files in /proc/$PID . While a process is looking up some information, it is possible that the process that PID initially referred to has terminated and a new process with this PID has been created. The looked up information is now incorrect, possibly resulting in a security vulnerability.

    The pidfd on the other hand always refers to one process and can be queried about the state of the process. This allows one to look up information from /proc/$PID without the race mentioned earlier. The SO_PEERCRED functionality in particular is interesting because it allows a service to query the pidfd of a connected client.

    Or so it seems. For flatpak, wayland compositors and D-Bus services also want to authenticate their clients but they do not rely on this functionality. Instead the preferred approach taken here (implemented in wayland as the security-context protocol, still in discussion for D-Bus as the org.freedesktop.DBus.Containers1 interface) is to create a new wayland or D-Bus socket for each application instance and make sure that those sockets are the only way to connect to the services (specifically the “normal” host sockets must not be made available to the application instances). Flatpak is responsible for creating those sockets, attaching some metadata, and then mounting them. Currently the metadata is a triple: the sandboxing engine (flatpak, snap, …), the application id and the application instance. There are plans to extend this for further metadata.

    The question is, why add all of this complexity to authenticate a process when the pidfd approach is so much easier. Turns out that the hard part is knowing what to do with the pid, where to look up the information that you need. For flatpak, /proc/$PID/root/.flatpak-info is a file that cannot be changed from processes in the sandbox and contains among other things the flatpak instance-id which can be used to look up more data from $XDG_RUNTIME_DIR/.flatpak/$instance-id . For snap the whole process is very different and other technologies like firejail have basically no way to do this lookup (don’t take my word on it).

    (Aside: xdg-desktop-portal does does look up flatpak information from /proc/$PID and I just said it doesn’t use SO_PEERCRED so why is this not broken? Flatpak makes sure there is a process in the sandbox that stays alive the entire time and acts as a proxy between the app and the D-Bus broker to enforce access control. The PID of all connections from inside the sandbox are the PID of the proxy. This is still technically racy but it becomes much harder to pull off an attack. Implementing SO_PEERCRED to get rid of the race entirely in xdg-desktop-portal would be nice.)

    Implementing all kinds of different, subtle mechanisms to look up information from different sandbox engines you might not even know exist doesn’t scale and that makes pidfd with SO_PEERCRED much less useful than one would expect in a lot of cases.

    • chevron_right

      Matthias Clasen: Paths in GTK, part 2

      news.movim.eu / PlanetGnome · Yesterday - 10:44 · 1 minute

    In the first part of this series, we introduced the concept of paths and looked at how to create a GskPath. But there’s more to paths than that.

    Path Points

    Many interesting properties of paths can change as you move along the trajectory of the path. To query such properties, we first need a way to pin down the point on the path that we are interested in.

    GTK has the GskPathPoint struct for this purpose, and provides a number of functions to obtain them, such as gsk_path_get_closest_point() , which lets you find the point on the path that is closest to a given point.

    Once you have a GskPathPoint, you can query the properties of the path at that point. The most basic property is the position, but you can also get the tangent, the curvature, or the distance from the beginning of the path.

    Input

    Another interesting question when using paths in a user interface is:

    Is the mouse pointer hovering over the path?

    You need the answer to this question if you want to highlight a path that the pointer is over, or if you want to react to the user clicking a path.

    For a filled path, GTK provides the answer with the gsk_path_in_fill() method.

    For a stroked path, it is much more complicated to provide a 100% accurate answer (in particular, if the stroke is using a dash pattern), but we can provide an approximate answer that is often good enough: a point is inside the stroke, if the distance to the closest point on the path is less than half the line width.

    Outlook

    The next part of this series will look at rendering with paths.

    • wifi_tethering open_in_new

      This post is public

      blog.gtk.org /2023/09/21/paths-in-gtk-part-2/

    • chevron_right

      Jussi Pakkanen: Circles do not exist

      news.movim.eu / PlanetGnome · 3 days ago - 12:57 · 2 minutes

    Many logos, drawings and other graphical designs have the following shape in it. What is this shape?

    If you thought: "Ah-ha! I'm smart and read the title of this blog post so I know that this is most definitely not a circle."

    Well it is. Specifically it is a raster image of a circle that I created with the Gimp just for this use.

    However almost every "circle" you can see in printed media (and most purely digital ones) are not, in fact, circles. Why is this?

    Since roughly the mid 80s all "high quality" print jobs have been done either in PostScript or, nowadays almost exclusively, in PDF. They use the same basic drawing model, which does not have a primitive for circles (or circle arcs). The only primitives they have are straight line segments, rectangles and Bézier curves. None of these can be used to express a circle accurately. You can only do an approximation of a circle but it is always slightly eccentric. The only way to create a proper circle is to have a raster image like the one above.

    Does this matter in the real world?

    For printing probably not. Almost nobody can tell the difference between a real circle and one that has been approximated with a Bézier curve with just four points. Furthermore, the human vision system is a bit weird and perfect circles look vertically elongated. You have to make them non-circular for people to consider them properly circular.

    But suppose you want to use one of these things:

    This is a laser cutter that takes its "print jobs" as a PDF file and uses its vector drawing commands to drive the cutting head. This means that it is impossible to use it to print a wheel. You'd need to attach the output to a lathe and sand it down to be round so it actually functions as a wheel rather than as a vibration source.

    Again one might ask whether this has any practical impact. For this case, again, probably not. But did you know that one of the cases PDF is being considered (and, based on Internet rumors, is already being used) is as an interchange format for CAD drawings? Now it suddenly starts mattering. If you have any component where getting a really accurate circle shape actually matters (like pistons and their holes) suddenly all your components are slightly misshaped. Which would not be fun.

    Extra bonus information

    Even though it is impossible to construct a path that is perfectly circular, PDF does provide a way to draw a filled circle. Here is the relevant snippet from the PDF 2.0 spec, subsection 8.5.3.2:

    If a subpath is degenerate (consists of a single-point closed path or of two or more points at the same coordinates), the S operator shall paint it only if round line caps have been specified, producing a filled circle centred at the single point.

    Who is willing to put money on the line that every PDF rendering implementation actually uses circles rather than doing the simple thing of approximating it with Béziers?

    • chevron_right

      Matthias Clasen: Paths in GTK

      news.movim.eu / PlanetGnome · 3 days ago - 02:18 · 2 minutes

    It is no secret that we want to get rid of cairo as the drawing API in GTK, so we can move more of our drawing onto the GPU.

    While People have found creative ways to draw things with render nodes, they don’t provide a comprehensive drawing API like Skia or, yes, cairo. Not a very satisfying state of affairs.

    A few years ago, we started to investigate how to change this, by making paths available as first-class objects in GTK. This effort is finally starting to come to fruition, and you can see the first results in GTK 4.13.0.

    Paths

    So, what is a path? A rough definition could be:

    A sequence of line segments or curves that may or may not be connected at their endpoints.

    When we say curves, we specifically mean quadratic or cubic Bézier curves . On top of cairo, we also support rational quadratic Béziers (or as Skia calls them: conics ), since they let us model circles and rounded rectangles precisely.

    This picture shows a typical path, consisting of 4 curves and 2 lines, some of which are connected. As you can see, paths can be closed (like the 4 curves here) or open (like the 2 lines), with a start- and endpoint.

    And how are paths useful for drawing? First, you can use a path to define an area (the part that’s inside the path) and fill it with a color, a gradient or some more complex content.

    Alternatively, you can stroke the path with various properties such as line width, color or dash pattern.

    Paths in GTK

    The object that we use for paths in GTK is GskPath . It is a compact, immutable representation that is optimized for rendering. To create a GskPath, you need to use a GskPathBuilder , which has many convenience methods to create paths, either from individual curves or from predefined shapes.

    This example creates a path that is a closed triangle:

    builder = gsk_path_builder_new ();
    gsk_path_builder_move_to (builder, 0, 50);
    gsk_path_builder_line_to (builder, 100, 50);
    gsk_path_builder_line_to (builder, 50, 0);
    gsk_path_builder_close (builder);
    path = gsk_path_builder_free_to_path (builder);

    And this one creates a circular path with the given center and radius:

    builder = gsk_path_builder_new ();
    gsk_path_builder_add_circle (builder, center, radius);
    path = gsk_path_builder_free_to_path (builder);

    Outlook

    In the next post, we’ll look at properties of paths, and how to query them.

    • wifi_tethering open_in_new

      This post is public

      blog.gtk.org /2023/09/19/paths-in-gtk/

    • chevron_right

      Georges Basile Stavracas Neto: Protected: Extending the month to infinity

      news.movim.eu / PlanetGnome · 7 days ago - 20:30

    This post is password protected. You must visit the website and enter the password to continue reading.

    • wifi_tethering open_in_new

      This post is public

      feaneron.com /2023/09/15/extending-the-month-to-infinity/

    • chevron_right

      Alice Mikhaylenko: Libadwaita 1.4

      news.movim.eu / PlanetGnome · 7 days ago - 17:45 · 5 minutes

    Screenshot showing a few apps using libadwaita 1.4, front to back: Files, Characters, Epiphany A few apps using libadwaita 1.4

    It’s that time of year again, so let’s look at what’s new.

    New Adaptive Widgets

    I’ve already talked about them in my last blog post , so I won’t go into details this time.

    Breakpoints

    Libadwaita 1.4 introduces a breakpoint system, allowing to change UI in arbitrary ways depending on the window size. Breakpoints can be used with AdwWindow , AdwApplicationWindow , or with AdwBreakpointBin if you need more control.

    Breakpoints can be used in a fully declarative way from UI files, for example:

    <object class="AdwBreakpoint">
      <condition>max-width: 500sp</condition>
      <setter object="split-view" property="collapsed">True</property>
    </object>
    

    As a tradeoff, you have to manually specify the window’s or bin’s minimum size and ensure its contents actually fit, same as you do on a small screen.

    To help with that, GtkButton , GtkMenuButton , AdwSplitButton and AdwButtonContent now all include a :can-shrink property to enable text ellipsizing, while widgets like AdwBanner automatically enable it for their buttons in order to not get uncontrollably wide.

    For breakpoint conditions one can use pixels ( px ), points ( pt ) or a new sp unit (scalable pixels, name lifted from Android), which is equivalent to pixels with default text scale, but scales with it: 1sp is equivalent to 1.25px with Large Text enabled and so on. To accommodate different text scale factors better, it is recommended to use sp whenever it’s feasible.

    Navigation View

    Screenshot of a small window with Page 2 in the title, a back button on the left and an "Open Page 3" button in the middle.

    AdwNavigationView is an integrated widget implementing the browsing pattern , replacing AdwLeaflet with can-unfold=false . It provides a navigation stack that can be populated statically (e.g. from a UI file) or dynamically, and automatically provides gestures and shortcuts.

    It also provides the navigation.push and navigation.pop actions , allowing to push pages directly from a UI file:

    <object class="AdwActionRow">
      <property name="title" translatable="yes">_Details</property>
      <property name="use-underline">True</property>
      <property name="activatable">True</property>
      <property name="action-name">navigation.push</property>
      <property name="action-target">"details"</property>
      <child>
        <object class="GtkImage">
          <property name="icon-name">go-next-symbolic</property>
          <property name="accessible-role">presentation</property>
        </object>
      </child>
    </object>
    

    To further simplify using it, AdwHeaderBar can automatically show the correct title for each navigation page, as well as a back button to pop the current page when appropriate.

    Automatic back buttons also provide a context menu that allows to pop multiple pages at once:

    Screenshot of a back button context menu in Settings, opened on the Night Light page and showing Displays and Settings items. Displays is hovered

    This still works with nested navigation views, as well with navigation views combined with split views.

    Split Views

    Screenshot of a window using a split view with prominent "Sidebar" and "Contents" labels in each pane and nothing else (except for a close button in the upper right corner)

    While AdwNavigationView replaces the can-unfold=false case of AdwLeaflet , AdwNavigationSplitView replaces the other one.

    It has two children: sidebar and content, and it displays them side by side. When the :collapsed property is set to TRUE , it literally turns into an AdwNavigationView . It doesn’t set it automatically though – you are supposed to do it from your breakpoints as needed.

    It also provides a more sophisticated sizing for the sidebar, based on the percentage of the split view’s total width.

    Meanwhile, AdwOverlaySplitView is similar, but instead of turning into a navigation view when collapsed, it overlays the sidebar over content, not unlike AdwFlap . As such, AdwFlap is what it replaces.

    It has a few extra features compared to navigation split view, such as an ability to move the sidebar to the right and show or hide it even when not collapsed, but the two widgets have extremely similar API.

    And, like with AdwNavigationView , AdwHeaderBar can integrate with split views: when put inside one, it will automatically hide redundant window buttons, so there’s no need to show or hide them manually like with AdwLeaflet or AdwFlap .

    Toolbar View

    The new split view styles really need flat header bars to work well. While we’ve had the .flat style class since libadwaita 1.0, in practice it’s quite limited, especially with scrolling content.

    As such, there’s a new widget called AdwToolbarView . It contains a content widget and a number of top and bottom bars (for example, AdwHeaderBar , AdwTabBar , GtkSearchBar , GtkActionBar , or GtkBox with the .toolbar style class). Then it will automatically manage the correct styles for the toolbars, for example making them flat and managing undershoot shadows on scrolling content (though this can be changed using the :top-bar-style and :bottom-bar-style properties), as well as collapsing spacing between them:

    Screenshot of GNOME Text Editor 45, showing a header bar and a tab bar inside a toolbar view

    It’s recommended to always use it instead of GtkBox when you have header bars or other toolbars, regardless of whether you’re using split views.

    Deprecations

    With breakpoints and the new widgets, a number of older widgets have been deprecated, namely AdwLeaflet , AdwFlap , AdwSqueezer and AdwViewSwitcherTitle , as well as the old subpage API in AdwPreferencesWindow and the .flat style class for header bars. Refer to the migration guide for how exactly to replace them.

    List Rows

    There has been a number of boxed list additions this cycle.

    Switch Row

    Screenshot of a switch row in a group with the title "Switch Rows". The row has a title "Switch Row" and the switch is active.

    Joshua Lee added AdwSwitchRow – a simple AdwActionRow subclass containing a GtkSwitch . While it’s easy to implement manually, it’s a very common case and so it’s nice to have a shortcut.

    Spin Row

    Screenshot of a spin row in a group with the title "Spin Rows". The row has a title "Spin Row" and the value is 50. Plus and minus buttons are both sensitive.

    Chris added AdwSpinRow – a list row with an embedded GtkSpinButton , similar to AdwEntryRow .

    Property Row

    Screenshot of a property row in a group with the title "Property Rows". The row has a title "Property Row" and the subtitle "Value".

    While it’s not a widget, the new .property style class, also by Chris , can swap styles on AdwActionRow ‘s title and subtitle to emphasize the latter. This can be useful when displaying, say, EXIF properties in an image viewer.

    Misc Changes


    As always, thanks to all the contributors who helped to make this release happen.

    • wifi_tethering open_in_new

      This post is public

      blogs.gnome.org /alicem/2023/09/15/libadwaita-1-4/

    • chevron_right

      Sam Thursfield: Status update, 15/09/2023

      news.movim.eu / PlanetGnome · 7 days ago - 11:06 · 7 minutes

    Musically this has been a fun month. One of my favourite things about living in Galicia is that ska-punk never went out of fashion here and you can legitimately go to a festival by the sea and watch Ska-P. Unexpectedly brilliant and chaotic live show. I saw an interview recently where Angelo Moore of Fishbone was asked by a puppet what his favourite music is, and he answered: “I like … I like the Looney Tunes music”. Same energy.

    I wrote already this month about my DIY media server and the openQA CLI tool . This post contains some brief thoughts about Nushell and then some lengthy thoughts about the future of the Web. Enjoy!

    Nushell everywhere

    I read a blog by serial shell innovator JT entited “The case for Nushell” . I’ve been using Nushell for data-focused work for a while and the post inspired me to make it my default shell in a few places.

    Nushell is really comfortable to use these days, it’s very addictive the first time you construct a one-liner to pretty-print some JSON or XML, select the fields you want and output a table as Markdown that you can paste straight into a Gitlab issue. My only complaint is the autocomplete isn’t quite as good as the Fish shell yet. (And that you can’t type rm -R … like chown and chmod only accept -R , and now rm only accepts a lower case -r , how am I supposed to remember that guys???)

    I have a load of arcane Bash knowledge that I guess I’ll have to hang onto for a while yet, particularly as my job mostly involves SSH’ing into strange old machines from the 1990s. Perhaps I can try and contribute Nushell binaries that run on HP-UX and Solaris. (For the avoidance of doubt, that previous sentence is a joke).

    Kagi Small Web

    There’s a new search engine on the block called Kagi which is marketed as “premium search engine”, you pay $0.05 per search, and in return the results are ad-free.

    I like this idea. I signed up for the free trial 100 searches, and I haven’t got far with them.

    It turns out most of the web searches I do, are things I could search on a specific site if I wasn’t so lazy. For example I search “rust stdio” when I could go to the Rust documentation on my local machine and search there. Or I search for a programming problem when I could clearly just search StackOverflow itself. DuckDuckGo has made me lazy; adding a potential $0.05 cost to searches firstly makes you realize how few you actually need to do. Maybe this is a good thing.

    Anyway, Kagi. They just launched something named Kagi Small Web, which is announced here :

    Kagi Small Web offers a fresh approach by promoting recently published content from the “small web.” We gather new content, published within the last week, from a handpicked list of blogs and surface it in multiple ways:

    • Directly within Kagi search results for applicable queries (existing Kagi members do not need to do anything, this will be automatic)
    • Via the new Kagi Small Web website
    • Through the Kagi Small Web RSS feed
    • Via our Search API, where results are now part of the news enrichment API

    Initially inspired by a vibrant discussion on Hacker News, we began our experiment in late July, highlighting blog posts from HN users within our search results. The positive feedback propelled the initiative forward. Today, our evolving concept boasts a curated list of nearly 6,000 genuine websites featuring people with a wide variety of interests.

    When I first saw this my mind initially jumped to the problematic parts. Who are these guys to suddenly define what the Small Web is, and define it as a a club of some 6,000 websites chosen by Hackers News? All sites must be in English, so is the Web only for English speakers now?? More importantly, why is my site not on the list? Why wasn’t I consulted ??

    There’s also something very inspiring about the project. I try to follow the rule “something is better than nothing”, and this project is a pretty bold step forwards, which inspired a bunch of thoughts about the future of The Web.

    Google Search is Dying

    Since about 2000, when you think of the Web, you think of Google.

    Google Search has been dying a slow, public death for about the last ten years. Google has been too big to innovate since the early 2010s (with one important exception, the Emoji Kitchen ).

    Google Search remained king until now for two reasons: one, their tech for turning hopelessly vague search queries into useful results was better than anyone’s in the industry, and two, as of 2023, almost nobody else can operate at the scale needed to index all of the text on the Web.

    I guess there’s a third reason too, which is spending billions of $$$ to be the default search provider nearly everywhere, to the point that the USA is running an antitrust hearing against them , but let’s focus on the technical aspects.

    The current fad for large language models is going to bring big changes to the Web, for better or worse. One of those is that “intent analysis” is suddenly much easier than it was. Note, I’m not talking about prompting an LLM with a question
    and presenting the randomly generated output as an answer. I’m talking about taking unstructured text, such as “trains to London” and turning it into an actionable query. A 1990’s era search engine would throw away the “to” return any website that contained “trains” and “London”. Google Search shows a table of live departure times for trains heading to London. (With some less useful things above and below, since this is the Google Search of 2023).

    A small LLM such as Vicuna can kinda just DO this stuff, not perfectly of course, but its an order of magnitude easier than a decade ago. Perhaps Google kept their own LLM research internal for so long for fear of losing exactly this edge? The “We have no moat” memo suggests fear.

    Onto the second thing, indexing all the content on the Web. LLMs don’t make this easier. They make it impossible.

    Its now so easy to generate human-like text on the Web using machines, that it doesn’t make sense to index all the text on the Web any more. Much of it is already human-generated generated garbage aiming to game search ranking algorithms (see “A storefront for robots” for fun examples).

    Very soon 99% of text on the web will be machine generated garbage. Welcome to the dark forest .

    For a short time I was worried about this, but I think it’s a natural evolution of the Web. This is the end of the Olde World Wide Web. What comes next?

    There is more than one Small Web

    If you’ve read this far, firstly, thanks and well done, in 2023 its hard to read so many paragraphs in one go! I didn’t even put in single video.

    Let me share the insight I had on thinking over Kagi Small Web. Maybe it’s obvious and maybe it isn’t.

    A search engine of 6,000 websites is small-scale enough that one person could conceivably run it.

    Let’s go back a step. How do you deal with a Web that’s 99% machine-generated noise? I imagine Google will try to solve this by using language models to detect if the page was generated by a language model, triggering another fairly pointless technological arms race against the spammers who will be generating this stuff. This won’t work very well.

    The only way for humans to make sense of the new Dark Forest Web is to have lists of websites, curated by humans, and to search through those when we want to find information.

    If you’re old you know that this isn’t a new idea. In fact, we kinda had all of this stuff in the form of web rings, link pages on other people’s websites, bookmarking services, link sites like Digg, Fark and Reddit, RSS feeds and feed readers. If you look at Kagi Small Web reader site it’s literally a web ring. It’s StumbleUpon. It’s Planet GNOME. But through the lens of 2023, it’s
    also something new.

    So I’m not going to start reading Kagi’s small web, though it may be great. And I’m going stop capitalising “small web”, because I think we’re going to curate millions of these collectively, in thousands of languages, in infinite online communities. We’re going to have open source tools for searching and archiving high quality online content. Who knows? Perhaps in 10 years we’ll have small web search tools integrated into GNOME.

    Further Reading

    This year, 2023, is the 25th Year of our Google, and The Verge are publishing a series of excellent articles looking forwards and backwards. I can recommend reading “The end of the Googleverse” as a starting point. Another great one: “ Google and YouTube are trying to have it both ways with AI and copyright “.

    • chevron_right

      Pratham Gupta: GSOC 2023 Final Report

      news.movim.eu / PlanetGnome · Thursday, 14 September - 12:54 · 3 minutes

    This is the final report for my project. Here i will be explaining about the method we took to find anagrams.

    GNOME Crosswords Editor

    Although still under development, Editor is a important part of Crosswords application for GNOME. It allows us to create basic crosswords with grids and clues.

    Project Information

    My project is add anagram-search support for the Crosswords Editor . The data for the searching comes from a word list file. My task is to search for anagrams. It needs to fast enough, so that the user can set the input word, and the search results are displayed instantaneously (without any lag).

    The word-list file

    To understand my project, you need to know about the data file i.e. word-list file. The file is made up of 3 sections:

    1. WordList: Just a huge list of words.
    2. FilterFragments: A list of word-indices that follow a particular pattern. e.g. A?? is pattern and its list will have index for words that start with A and have length of 3. Similarly for fragment ?B??, the list will have offsets for words that have second letter as B and length of 4.
    3. Index Section: A JSON block which stores indexes for the above sections.

    Finding Anagrams

    Approach 1

    We simulate a trie to find anagrams of a word.

    Trie to search for anagrams

    Here at each node we maintain a list of words that follow the particular pattern, like AT? will have words having a prefix of AT and length 3. For branches where the list of words becomes empty like AT?, we perform branch culling.

    We were successful in finding anagrams for a word using this approach, but this method is a bit slow to be directly used in a user interface. For a 12 letter word like ABBREVIATION, it took 0.3 seconds to find the anagrams. Thus we need another faster method.

    Approach 2

    Basic Idea

    I will try to explain this with an example:

    • Two words: HEART and EARTH
    • Sort these words, we get AEHRT and AEHRT
    • Create a hash for both of them
    • Notice that they will have the same hash as the sorted words are same
    • We use this unique property to find anagrams.

    Implementation

    We solve this problem in two stages:

    Stage 1: At the compile time

    Create a new section in word-list file for anagram fragments, it has the following two parts:

    1. anagram word list: This is a list of words that have same hash. We store a gushort (2 bytes) as index for a word.
    2. anagram hash index: Every entry in anagram word list has a corresponding entry in anagram hash index section, where we store the hash (guint — 4 bytes), offset of the the list entry (guint - 4bytes) and the length of the entry (gchar — 1 byte), with the total size of 9 bytes for each index.

    Stage 2: At the run time

    Here we search for anagrams from the data created in stage 1. The search follows like:

    • Suppose we have a word “BAT”, we sort its letters and hash it, lets call the hash generated as H1 and the length of word i.e. 3 as LE1.
    • Now, search for H1 in the anagram hash index section, this will be a binary search as the section is always sorted, thus will be very fast.
    • Once found, store the offset (the next 4 bytes), called O1 and the length (the next 1 byte), called L1. The offset points to the anagram-indices stored in the angram-word-list section and the length tells us the number of anagrams.
    • Go to O1 and read the next L1 * 2 bytes. Every 2 bytes here are the index of the words that we want.
    • We get the indices, lets say I1 and I2, set a Fragment list of length LE1, and read the words at index I1 and I2. These are the required anagrams.

    Thus we have found the anagrams, we can now show them to the user.

    What’s done

    Code to write data into word list file has been written and pushed into the main repository. The code creates the above discussed anagram-hash-index and anagram-word-list sections.

    What’s left to do

    We need to read the word-list file in the run time, find the anagrams using the above mentioned approach and display them to the user.

    • wifi_tethering open_in_new

      This post is public

      medium.com /@prathamgupta1455/gsoc-2023-final-report-61a19a7117f5