Communities

Pl chevron_right

GPT Systems and Relationships

pubsub.slavino.sk / planetdebian · Sunday, 20 August, 2023 - 04:32 · 2 minutes

Sam Hartman wrote an interesting blog post about his work as a sex and intimacy educator and how GPT systems could impact that [1] .

I’ve read some positive reviews of Replika – a commercial system that is somewhat promoted as a counsellor [2] , so I decided to try it out. In my brief trial it seemed to be using all the methods that Android pay to play games are known for. Having multiple types of in-game currency, pay to buy new clothes etc for your friend, etc. Basically it seems pretty horrible. I didn’t pay for it and the erotic and romantic features all require payment so I didn’t test that.

When thinking about this logically, having a system designed to deal with people when they are vulnerable (either being in a romantic relationship or getting counselling) that uses manipulative techniques to get money from them can’t have a good result. So a free software system seems the best option.

When I first learned of virtual girlfriends I never thought I would feel compelled to advocate for a free software virtual dating program, but that’s where the world has got to.

Virtual girlfriends have been around for years now. Several years ago I watched a documentary about their use in Japan. It seemed a bit strange when a group of men who had virtual girlfriends had a dinner party with their tablets and phones propped up so their girlfriends could join in as they all appeared to be dating the same girl. The documentary didn’t go in to enough detail to cover whether the girlfriend app could learn or be customised enough that they would seem to have different personalities.

Virtual boyfriends have also been around for a while apparently without most people noticing. I just Googled it and found a review of a virtual boyfriend app published in 2016!

One thing that will probably concern people is the possibility for virtual dating systems to be used for inappropriate things. That is a reasonable thing to be concerned about but I don’t think it’s possible to prevent technology that has already been released from doing such things. As a general rule technology can always be used for good and bad things so we need to just make it easy to do good things and let the legal system develop ways of dealing with the bad things.

Značky: #Debian

Pl chevron_right

KDE: A day in the life of the KDE snapcrafter!

pubsub.slavino.sk / planetdebian · Friday, 18 August, 2023 - 18:26 · 2 minutes

KDE Mascot

As mentioned last week, I am still looking for a super awesome team lead for a super amazing project involving KDE and Snaps. Time is running out and well the KDE world will be a better a better place if this project goes through! I would like to clarify, this is a paid position! A current KDE developer would be ideal as it is a small team so your time will be split managing and coding alike. If you or anyone you know might be interested please contact me ASAP!

On to snappy things I have achieved this week:

Most 23.04.3 is done, I am just testing them now. With that said, I have seen on the internets –candidate channel apps being promoted. Please use this channel with utmost care as they are being tested and could quite possibly be very broken!

Still working on some QML issues with kirigami platform not found.

I have begun the launchpad build issues journey and have been kindly pointed to using snap recipes on launchpad so we aren’t doing public uploads which creates temporary recipes to build and cannot be bumped priority wise. So I have sent the request into the kde-devel arena to revisit having per repository snapcraft files ( rejected in the past ) as they do with flatpak files. So far I am getting positive feedback and hopefully this will go through. Once it does I can move forward with fully automating application new releases. Hooray!

This week I jumped into the xdg-desktop-portals rabbithole while working on https://bugs.kde.org/show_bug.cgi?id=473003 for neochat. After fixing it with adding plug password-manager-service I am told that auto-connect on that one is discouraged and the libsecret should work out of the box with portals. I found and joined just in time a snapcrafter google meet and we had a long conversation spitballing and testing our portal support. At least in Neon it appears to be broken. I now have some things to do and test to see that we get this functional. Most of our online apps are affected. For now though – snap connect neochat:password-manager-service :password-manager-service does work. Auto-connect was rejected as it exposes to much. Understandable.

I have started a new thread on the KDE forums for users to ask any questions, or let me know of any issues you may have related to snaps here: https://discuss.kde.org/t/all-things-snaps-questions-concerns-praise/4033 come join the conversation!

In the snapcraft arena I have fixed my PR for the much needed qmake plugin! This should be merged and rolled out in the very near future!

I would like to continue my hard work on snap things regardless of the project going through. Unfortunately, to do so, I must ask for donations as life isn’t free. I am working on self sufficiency but even that costs money to get started! KDE snaps are used by 1.7 million active devices! I do ask that if you use KDE snaps and find my work useful, or know someone that does, to please consider donating to keep my momentum going. There is still much work to be done with Qt6 rolling out. I would like to work on the KDE Plasma snap and KDE PIM suite of apps ( I have started on this ).

Even if you can’t help, please share! Thank you for your consideration! I have a new donation form for anyone that doesn’t like gofundme here:

Donation Form

Gofund me:

Značky: #Debian

Pl chevron_right

#43: r2u Faster Than the Alternatives

pubsub.slavino.sk / planetdebian · Friday, 18 August, 2023 - 02:18 · 2 minutes

Welcome to the 43th post in the $R^4 series.

And with that, a good laugh. When I set up Sunday’s post , I was excited enough about the (indeed exciting !!) topic of r2u via browser or vscode that I mistakenly labeled it as the 41th post. And overlooked the existing 41th post from July! So it really is as if Douglas Adams, Arthur Dent, and, for good measure, Dirk Gently, looked over my shoulder and declared there shall not be a 42th post!! So now we have two 41th post: Sunday’s and July’s .

Back the current topic, which is of course r2u . Earlier this week we had a failure in (an R based) CI run (using a default action which I had not set up). A package was newer in source than binary, so a build from source was attempted. And of course failed as it was a package needing a system dependency to build. Which the default action did not install.

I am familiar with the problem via my general use of r2u (or my r-ci which uses it under the hood). And there we use a bspm variable to prefer binary over possibly newer source . So I was curious how one would address this with the default actions. It so happens that the same morning I spotted a StackOverflow question on the same topic, where the original poster had suffered the exact same issue!

I offered my approach (via r2u ) as a comment and was later notified of a follow-up answer by the OP. Turns our there is a new, more powerful action that does all this, potentially flipping to a newer version and building it, all while using a cache.

Now I was curious, and in the evening cloned the repo to study the new approach and compare the new action to what r2u offers. In particular, I was curious if a use of caches would be benficial on repeated runs. A screenshot of the resulting Actions and their times follows.

Turns out maybe not so much (yet ?). As the actions page of my cloned ‘comparison repo’ shows in this screenshot, r2u is consistently faster at always below one minute compared to new entrant at always over two minutes . (I should clarify that the original actions sets up dependencies, then scrapes, and commits. I am timing only the setup of dependencies here.)

We can also extract the six datapoints and quickly visualize them.

Now, this is of course entirely possibly that not all possible venues for speedups were exploited in how the action setup was setup. If so, please file an issue at the repo and I will try to update accordingly. But for now it seems that a default of setup r2u is easily more than twice as fast as an otherwise very compelling alternative (with arguably much broader scope). However, where r2u choses to play, on the increasingly common, popular and powerful Ubuntu LTS setup, it clearly continues to run circles around alternate approaches. So the saying remains:

r2u: fast, easy, reliable.

If you like this or other open-source work I do, you can now sponsor me at GitHub .

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Originally posted 2023-08-13, minimally edited 2023-08-15 which changed the timestamo and URL.

Značky: #Debian

Pl chevron_right

Debian 30th Birthday: Local Group event and Interview

pubsub.slavino.sk / planetdebian · Thursday, 17 August, 2023 - 21:46 · 3 minutes

Inspired by the fine Debian Local Groups all over the world, I’ve long since wanted to start one in Cape Town. Unfortunately, there’s been many obstacles over the years. Shiny distractions, an epidemic, DPL terms… these are just some of the things that got in the way.

Fortunately, things are starting to gain traction, and we’re well on our way to forming a bona fide local group for South Africa.

We got together at Woodstock Grill, they have both a nice meeting room, and good food and beverage, also reasonably central for most of us.

Cake

Starting with the important stuff, we got this Debian cake made that ended up much bigger than we expected, at least we all got to take some home too! (it tasted great too)

Yes, cake.

Talk

This event was planned very last minute, so we didn’t do any kind of RSVP and I had no idea who exactly would show up, so I went ahead and prepared one of my usual introduction to Linux and Debian talks, and how these things are having an impact on the world out there. I also talked a bit about the community and how we intend to grow our local team here in South Africa. It turned out most of the audience were already experienced Linux users, but I was happy to see that they were very enthusiastic about the local group concept!

While reading through some material to find some inspiration for this talk, I came across an old quote from the original Debian Manifesto that I found very poignant again, so I feel compelled to share (didn’t use it in my talk this time though since I didn’t cover much current events):

“The time has come to concentrate on the future of Linux rather than on the destructive goal of enriching oneself at the expense of the entire Linux community and its future.” – Ian Murdock, 1994

Debian-ZA logo

Tammy spent some time creating a whole bunch of logo concepts, that she presented to us. They aren’t meant as final logo choices, but as initial concepts, and they worked well to invoke a very lively discussion about logos and design!

Here are just some of her designs that I cherry picked since they were the most discussed. We still haven’t decided if it will be Debian ZA or Debian-ZA or Debian South Africa, although the latter will probably cause least confusion internationally.

Personally, the last one on this image that I referred to as “the carpet” is my personal favourite :-)

Happy Birthday Song

John and Heila wrote a happy birthday song. After seeing the lyrics (and considering myself an amateur lyricist) I thought it was way too tacky and I told John to put it away. But when the cake came out, someone said “we should sing!” and the lyrics quickly re-emerged and was handed out to everyone. It’s also meant to be a loopy clip for the upcoming DebConf23 . I’ll concede that it worked out alright in the end! You judge for yourself:

Mousepads

People still use mousepads? That was my initial reaction when Heila told me that she was going to make some commemorative 30 year Debian mousepads for our birthday event, and they ended up being popular. It’s probably a safer surface to put dev boards on than on my desk directly, so at least I do have a use for one!

Group Photo

The group photo, complete with disco ball! We should’ve taken this earlier because it was already getting late and some people had to head back home. Lesson learned for next time!

30th Birthday Interview with The Changelog

I also did an interview with The Changelog for Debian’s 30th birthday . It was late, and I haven’t had a chance to listen to it yet, so I hope their producers managed to edit something coherent out of my usual Debian babbling:

More later

There’s so much more I’d like to say about Debian, the last 30 years, local groups, the eco-system that we find ourselves in. And, Lemmy ! But I’ll see if I can get an instance up over the weekend and will then talk about that some more another time.

Značky: #Debian

Pl chevron_right

A First Exercise with AI Training

pubsub.slavino.sk / planetdebian · Wednesday, 16 August, 2023 - 14:13 · 5 minutes

Taking a hands-on low-level approach to learning AI has been incredibly rewarding. I wanted to create an achievable task that would motivate me to learn the tools and get practical experience training and using large language models. Just at the point when I was starting to spin up GPU instances, Llama2 was released to the public. So I elected to start with that model. As I mentioned , I’m interested in exploring how sex-positive AI can help human connection in positive ways. For that reason, I suspected that Llama2 might not produce good results without training: some of Meta’s safety goals run counter to what I’m trying to explore. I suspected that there might be more attention paid to safety in the chat variants of Llama2 rather than the text generation variants, and working against that might be challenging for a first project, so I started with Llama-2-13b as a base.

Preparing a Dataset

I elected to generate a fine tuning dataset using fiction. Long term, that might not be a good fit. But I’ve always wanted to understand how an LLM’s tone is adjusted—how you get an LLM to speak in a different voice. So much of fine tuning focuses on examples where a given prompt produces a particular result. I wanted to understand how to bring in data that wasn’t structured as prompts. The Huggingface course actually gives an example of how to adjust a model set up for masked language modeling trained on wikitext to be better at predicting the vocabulary of movie reviews. There though, doing sample breaks in the dataset at movie review boundaries makes sense. There’s another example of training an LLM from scratch based on a corpus of python code. Between these two examples, I figured out what I needed. It was relatively simple in retrospect: tokenize the whole mess, and treat everything as output. That is, compute loss on all the tokens.

Long term, using fiction as a way to adjust how the model responds is likely to be the wrong starting point. However, it maximized focus on aspects of training I did not understand and allowed me to satisfy my curiosity.

Rangling the Model

I decided to actually try and add additional training to the model directly rather than building an adapter and fine tuning a small number of parameters. Partially this was because I had enough on my mind without understanding how LoRA adapters work. Partially, I wanted to gain an appreciation for the infrastructure complexity of AI training. I have enough of a cloud background that I ought to be able to work on distributed training. (As it turned out, using BitsAndBytes 8-bit optimizer, I was just able to fit my task onto a single GPU).

I wasn’t even sure that I could make a measurable difference in Llama-2-13b running 890,000 training tokens through a couple of training epochs. As it turned out I had nothing to fear on that front.

Getting everything to work was more tricky than I expected. I didn’t have an appreciation for exactly how memory intensive training was. The Transformers documentation points out that with typical parameters for mixed-precision training, it takes 18 bytes per model parameter. Using bfloat16 training and an 8-bit optimizer was enough to get things to fit.

Of course then I got to play with convergence. My initial optimizer parameters caused the model to diverge, and before I knew it, my model had turned to NAN, and would only output newlines. Oops. But looking back over the logs, watching what happened to the loss, and looking at the math in the optimizer to understand how I ended up getting something that rounded to a divide by zero gave me a much better intuition for what was going on.

The results.

This time around I didn’t do anything in the way of quantitative analysis of what I achieved. Empirically I definitely changed the tone of the model. The base Llama-2 model tends to steer away from sexual situations. It’s relatively easy to get it to talk about affection and sometimes attraction. Unsurprisingly, given the design constraints, it takes a bit to get it to wonder into sexual situations. But if you hit it hard enough with your prompt, it will go there, and the results are depressing. At least for prompts I used, it tended to view sex fairly negatively. It tended to be less coherent than with other prompts. One inference managed to pop out in the middle of some text that wasn’t hanging together well, “Chapter 7 - Rape.”

With my training, I did manage to achieve my goal of getting the model to use more positive language and emotional signaling when talking about sexual situations. More importantly, I gained a practical understanding of many ways training can go wrong.

There were overfitting problems: names of characters from my dataset got more attention than I wished they did. As a model for interacting with some of the universes I used as input, that was kind of cool, but if I was looking to just adjust how the model talked about intimate situations, I massively got things to be too specific.
I gained a new appreciation for how easy it is to trigger catastrophic forgetting.
I begin to appreciate how this sort of unsupervised training could be best paired with supervised training to help correct model confusion. Playing with the model, I often ran into cases where my reaction was like “Well, I don’t want to train it to give that response, but if it ever does wander into this part of the state space, I’d like to at least get it to respond more naturally.” And I think I understand how to approach that either with custom loss functions or manipulating which tokens compute loss and which ones do not.
And of course realized I need to learn a lot about sanitizing and preparing datasets.

A lot of articles I’ve been reading about training make more sense. I have better intuition for why you might want to do training a certain way, or why mechanisms for countering some problem will be important.

Future Activities:

Look into LoRA adapters; having understood what happens when you manipulate the model directly, I can now move on to intelligent solutions.
Look into various mechanisms for rewards and supervised training.
See how hard it is to train a chat based model out of some of its safety constraints.
Construct datasets; possibly looking at sources like relationship questions/advice.

comments

Značky: #Debian

Pl chevron_right

Enforcing wrap-and-sort -satb

pubsub.slavino.sk / planetdebian · Wednesday, 16 August, 2023 - 09:00 · 1 minute

For Debian package maintainers, the wrap-and-sort tool is one of those nice tools that I use once in a while, and every time have to re-read the documentation to conclude that I want to use the --wrap-always --short-indent --trailing-comma --sort-binary-package options (or -satb for short). Every time, I also wish that I could automate this and have it always be invoked to keep my debian/ directory tidy, so I don’t have to do this manually once every blue moon. I haven’t found a way to achieve this automation in a non-obtrusive way that interacts well with my git-based packaging workflow. Ideally I would like for something like the lintian-hook during gbp buildpackage to check for this – ideas?

Meanwhile, I have come up with a way to make sure I don’t forget to run wrap-and-sort for long, and that others who work on the same package won’t either: create an autopkgtest which is invoked during the Salsa CI/CD pipeline using the following as debian/tests/wrap-and-sort :

#!/bin/sh

set -eu

TMPDIR=$(mktemp -d)
trap "rm -rf $TMPDIR" 0 INT QUIT ABRT PIPE TERM

cp -a debian $TMPDIR
cd $TMPDIR
wrap-and-sort -satb
diff -ur $OLDPWD/debian debian

Add the following to debian/tests/control to invoke it – which is intentionally not indented properly so that the self-test will fail so you will learn how it behaves.

Tests: wrap-and-sort
Depends: devscripts, python3-debian

Now I will get build failures in the pipeline once I upload the package into Salsa, which I usually do before uploading into Debian. I will get a diff output, and it won’t be happy until I push a commit with the output of running wrap-and-sort with the parameters I settled with.

While autopkgtest is intended to test the installed package, the tooling around autopkgtest is powerful and easily allows this mild abuse of its purpose for a pleasant QA improvement.

Thoughts? Happy hacking!

Značky: #Debian

Pl chevron_right

Perl test suites in GitLab

pubsub.slavino.sk / planetdebian · Wednesday, 16 August, 2023 - 08:05 · 6 minutes

I've been maintaining a number of Perl software packages recently. There's SReview , my video review and transcoding system of which I split off Media::Convert a while back; and as of about a year ago, I've also added PtLink , an RSS aggregator (with future plans for more than just that).

All these come with extensive test suites which can help me ensure that things continue to work properly when I play with things; and all of these are hosted on salsa.debian.org, Debian's gitlab instance. Since we're there anyway, I configured GitLab CI/CD to run a full test suite of all the software, so that I can't forget, and also so that I know sooner rather than later when things start breaking.

GitLab has extensive support for various test-related reports, and while it took a while to be able to enable all of them, I'm happy to report that today, my perl test suites generate all three possible reports. They are:

The coverage regex, which captures the total reported coverage for all modules of the software; it will show the test coverage on the right-hand side of the job page (as in this example ), and it will show what the delta in that number is in merge request summaries (as in this example
The JUnit report, which tells GitLab in detail which tests were run, what their result was, and how long the test took (as in this example )
The cobertura report, which tells GitLab which lines in the software were ran in the test suite; it will show up coverage of affected lines in merge requests, but nothing more. Unfortunately, I can't show an example here, as the information seems to be no longer available once the merge request has been merged.

Additionally, I also store the native perl Devel::Cover report as job artifacts, as they show some information that GitLab does not.

It's important to recognize that not all data is useful. For instance, the JUnit report allows for a test name and for details of the test. However, the module that generates the JUnit report from TAP test suites does not make a distinction here; both the test name and the test details are reported as the same. Additionally, the time a test took is measured as the time between the end of the previous test and the end of the current one; there is no "start" marker in the TAP protocol.

That being said, it's still useful to see all the available information in GitLab. And it's not even all that hard to do:

test:
  stage: test
  image: perl:latest
  coverage: '/^Total.* (\d+.\d+)$/'
  before_script:
    - cpanm ExtUtils::Depends Devel::Cover TAP::Harness::JUnit Devel::Cover::Report::Cobertura
    - cpanm --notest --installdeps .
    - perl Makefile.PL
  script:
    - cover -delete
    - HARNESS_PERL_SWITCHES='-MDevel::Cover' prove -v -l -s --harness TAP::Harness::JUnit
    - cover
    - cover -report cobertura
  artifacts:
    paths:
    - cover_db
    reports:
      junit: junit_output.xml
      coverage_report:
        path: cover_db/cobertura.xml
        coverage_format: cobertura

Let's expand on that a bit.

The first three lines should be clear for anyone who's used GitLab CI/CD in the past. We create a job called test ; we start it in the test stage, and we run it in the perl:latest docker image. Nothing spectacular here.

The coverage line contains a regular expression. This is applied by GitLab to the output of the job; if it matches, then the first bracket match is extracted, and whatever that contains is assumed to contain the code coverage percentage for the code; it will be reported as such in the GitLab UI for the job that was ran, and graphs may be drawn to show how the coverage changes over time. Additionally, merge requests will show the delta in the code coverage, which may help deciding whether to accept a merge request. This regular expression will match on a line of that the cover program will generate on standard output.

The before_script section installs various perl modules we'll need later on. First, we intall ExtUtils::Depends . My code uses ExtUtils::MakeMaker , which ExtUtils::Depends depends on (no pun intended); obviously, if your perl code doesn't use that, then you don't need to install it. The next three modules -- Devel::Cover , TAP::Harness::JUnit and Devel::Cover::Report::Cobertura are necessary for the reports, and you should include them if you want to copy what I'm doing.

Next, we install declared dependencies, which is probably a good idea for you as well, and then we run perl Makefile.PL , which will generate the Makefile. If you don't use ExtUtils::MakeMaker, update that part to do what your build system uses. That should be fairly straightforward.

You'll notice that we don't actually use the Makefile. This is because we only want to run the test suite, which in our case (since these are PurePerl modules) doesn't require us to build the software first. One might consider that this makes the call of perl Makefile.PL useless, but I think it's a useful test regardless; if that fails, then obviously we did something wrong and shouldn't even try to go further.

The actual tests are run inside a script snippet, as is usual for GitLab. However we do a bit more than you would normally expect; this is required for the reports that we want to generate. Let's unpack what we do there:

cover -delete

This deletes any coverage database that might exist (e.g., due to caching or some such). We don't actually expect any coverage database, but it doesn't hurt.

HARNESS_PERL_SWITCHES='-MDevel::Cover'

This tells the TAP harness that we want it to load the Devel::Cover addon, which can generate code coverage statistics. It stores that in the cover_db directory, and allows you to generate all kinds of reports on the code coverage later (but we don't do that here, yet).

prove -v -l -s

Runs the actual test suite, with v erbose output, s huffling (aka, randomizing) the test suite, and adding the l ib directory to perl's include path. This works for us, again, because we don't actually need to compile anything; if you do, then -b (for blib ) may be required.

ExtUtils::MakeMaker creates a test target in its Makefile, and usually this is how you invoke the test suite. However, it's not the only way to do so, and indeed if you want to generate a JUnit XML report then you can't do that. Instead, in that case, you need to use the prove , so that you can tell it to load the TAP::Harness::JUnit module by way of the --harness option, which will then generate the JUnit XML report. By default, the JUnit XML report is generated in a file junit_output.xml . It's possible to customize the filename for this report, but GitLab doesn't care and neither do I, so I don't. Uploading the JUnit XML format tells GitLab which tests were run and

Finally, we invoke the cover script twice to generate two coverage reports; once we generate the default report (which generates HTML files with detailed information on all the code that was triggered in your test suite), and once with the -report cobertura parameter, which generates the cobertura XML format.

Once we've generated all our reports, we then need to upload them to GitLab in the right way. The native perl report, which is in the cover_db directory, is uploaded as a regular job artifact, which we can then look at through a web browser, and the two XML reports are uploaded in the correct way for their respective formats.

All in all, I find that doing this makes it easier to understand how my code is tested, and why things go wrong when they do.

Značky: #Debian

Pl chevron_right

J'ai oublié « make clean ».

pubsub.slavino.sk / planetdebian · Wednesday, 16 August, 2023 - 04:32

Je ne me souviens plus de la dernière fois où j'ai utilisé la commande make clean . Si j'empaquète pour Debian, le travail se fait dans un dépôt git , et j'utilise les commandes git clean -fdx ; git checkout . que je peux rappeler depuis mon historique des commandes via Ctrl-r la plupart du temps. Et dans les autres cas, si les sources ne sont pas déjà dans git , alors les commandes git init . ; git add . ; git commit -m 'hopla' règlent le problème.

Značky: #Debian

Pl chevron_right

#41: Using r2u in Codespaces

pubsub.slavino.sk / planetdebian · Tuesday, 15 August, 2023 - 17:21 · 4 minutes

Welcome to the 41th post in the $R^4 series. This post draws on joint experiments first started by Grant building on the lovely work Eitsupi as part of our Rocker Project . In short, r2u is an ideal match for Codespaces , a Microsoft/GitHub service to run code ‘locally but in the cloud’ via browser or Visual Studio Code . This posts co-serves as the README.md in the .devcontainer directory as well as a vignette for r2u .

So let us get into it. Starting from the r2u repository, the .devcontainer directory provides a small self-containted file devcontainer.json to launch an executable environment R using r2u. It is based on the example in Grant McDermott’s codespaces-r2u repo and reuses its documentation. It is driven by the Rocker Project’s Devcontainer Features repo creating a fully functioning R environment for cloud use in a few minutes. And thanks to r2u you can add easily to this environment by installing new R packages in a fast and failsafe way.

Try it out

To get started, simply click on the green “Code” button at the top right. Then select the “Codespaces” tab and click the “+” symbol to start a new Codespace.

The first time you do this, it will open up a new browser tab where your Codespace is being instantiated. This first-time instantiation will take a few minutes (feel free to click “View logs” to see how things are progressing) so please be patient. Once built, your Codespace will deploy almost immediately when you use it again in the future.

After the VS Code editor opens up in your browser, feel free to open up the examples/sfExample.R file. It demonstrates how r2u enables us install packages and their system-dependencies with ease, here installing packages sf (including all its geospatial dependencies) and ggplot2 (including all its dependencies). You can run the code easily in the browser environment: Highlight or hover over line(s) and execute them by hitting Cmd + Return (Mac) / Ctrl + Return (Linux / Windows).

(Both example screenshots reflect the initial codespaces-r2u repo as well as personal scratchspace one which we started with, both of course work here too.)

Do not forget to close your Codespace once you have finished using it. Click the “Codespaces” tab at the very bottom left of your code editor / browser and select “Close Current Codespace” in the resulting pop-up box. You can restart it at any time, for example by going to https://github.com/codespaces and clicking on your instance.

Extend r2u with r-universe

r2u offers “fast, easy, reliable” access to all of CRAN via binaries for Ubuntu focal and jammy. When using the latter (as is the default), it can be combined with r-universe and its Ubuntu jammy binaries. We demontrates this in a second example file examples/censusExample.R which install both the cellxgene-census and tiledbsoma R packages as binaries from r-universe (along with about 100 dependencies), downloads single-cell data from Census and uses Seurat to create PCA and UMAP decomposition plots. Note that in order run this you have to change the Codespaces default instance from ‘small’ (4gb ram) to ‘large’ (16gb ram).

Local DevContainer build

Codespaces are DevContainers running in the cloud (where DevContainers are themselves just Docker images running with some VS Code sugar on top). This gives you the very powerful ability to ‘edit locally’ but ‘run remotely’ in the hosted codespace. To test this setup locally, simply clone the repo and open it up in VS Code. You will need to have Docker installed and running on your system (see here ). You will also need the Remote Development extension (you will probably be prompted to install it automatically if you do not have it yet). Select “Reopen in Container” when prompted. Otherwise, click the >< tab at the very bottom left of your VS Code editor and select this option. To shut down the container, simply click the same button and choose “Reopen Folder Locally”. You can always search for these commands via the command palette too ( Cmd+Shift+p / Ctrl+Shift+p ).

Use in Your Repo

To add this ability of launching Codespaces in the browser (or editor) to a repo of yours, create a directory .devcontainers in your selected repo, and add the file .devcontainers/devcontainer.json . You can customize it by enabling other feature, or use the postCreateCommand field to install packages (while taking full advantage of r2u ).

Acknowledgments

There are a few key “plumbing” pieces that make everything work here. Thanks to:

My Rocker Project colleague @eitsupi for maintaining the R DevContainer Features .
@renkun-ken and the rest of the VS Code R extension team.
@Enchufa2 for bspm making package installation to the sysstem so seamless.
@grantmcdermott for the initial codespaces-r2u setup from which we derived this.
Last but not least everybody who helped me make r2u possible, tested it, or sent hints for improvement.

Colophon

More information about r2u is at its site, and we answered some question in issues, and at stackoverflow. More questions are always welcome!

If you like this or other open-source work I do, you can now sponsor me at GitHub .

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Značky: #Debian