• Ha chevron_right

      C2PA's Time Warp

      pubsub.slavino.sk / hackerfactor · Monday, 4 March - 17:35 edit · 12 minutes

    Throughout my review of the C2PA specification and implementation, I've been focused on how easy it is to create forgeries that appear authentic. But why worry about forgeries when C2PA can't even get ordinary uses correct?

    Just consider the importance of the recorded timestamps. Accurate time records can resolve questions related to ordering and precedence, like "when did this happen?" and "who had it first?" Timestamps can address copyright assignment issues and are used with investigations to identify if something could or could not have happened.

    At my FotoForensics service, I've seen an increase in pictures containing C2PA metadata. They have come from Adobe, Microsoft, OpenAI (and DALL-E), Stability AI (Stable Diffusion), Leica (camera company), and others. Unfortunately, with more images, I'm seeing more problems -- including problems with timestamps.

    I typically use my FotoForensics service for these analysis blog entries. However, this time I'm going to use my Hintfo service ( hintfo.com ) to show the metadata. I also want to emphasize that all of the examples in this blog entry were submitted by real people to the public FotoForensics service; I didn't manufacture any of these pictures.

    Out of Sync

    I first noticed the problem with Microsoft's AI-generated pictures. For example:

    analysis.php?id=c5ec085452c153859d7b29be7ed31559aca5800d.251338&fmt=orig&size=400
    (Click on the picture to view the C2PA metadata at Hintfo.)

    Adobe's Content Credentials web site does not identify any issues with this picture. However, the internal metadata contains two interesting timestamps. I extracted them using Adobe's c2patool . The first timestamp is part of the provenance: how , what , and when the picture was created:
    "assertion_store": {
    "c2pa.actions": {
    "actions": [
    {
    "action": "c2pa.created",
    "description": "AI Generated Image",
    "softwareAgent": "Bing Image Creator",
    "when": "2024-01-28T19:34:25Z"
    }
    ]
    }

    This provenance information identifies an AI Generated Image. It was created by Microsoft's Bing Image Creator on 2024-01-28 at 19:34:25 GMT.

    The other timestamp identifies when the metadata was notarized by an external third-party signatory:
    "signature": {
    "alg": "ps256",
    "issuer": "Microsoft Corporation",
    "time": "2024-01-28T19:34:24+00:00"
    }

    The external third-party timestamp authority works like a notary. It authoritatively states that it saw a signature for this picture at a specific date and time. The picture had to have been created at or before this timestamp, but not later.

    Adobe's c2patool has a bug that conflates information from different X.509 certificates. The cryptographic signature over the entire file was issued by Microsoft, but the time from the timestamp authority response was issued by DigiCert (not Microsoft); DigiCert isn't mentioned anywhere in the c2patool output. This bug gives the false impression that Microsoft notarized their own data. To be clear: Microsoft generated the file and it was notarized by DigiCert. Although attribution is a critical component to provenance, Adobe's c2patool mixes up the information and omits a signatory's identification, resulting in a misleading attribution. (This impacts Adobe's c2patool and Adobe's Content Credentials web site.)

    Ignoring the attribution bug, we can combine these provenance and notary timestamps with the time when FotoForensics received the picture; FotoForensics defines the last possible modification time since the files are stored on my servers in a forensically sound manner:

    2024-01-28 19:34:24 GMT x.509 Signed Timestamp Trusted external timestamp from DigiCert
    2024-01-28 19:34:25 GMT JUMBF: AI image created Internal C2PA metadata from Microsoft
    2024-02-01 10:33:29 GMT FotoForensics: Received File cannot be modified after this time

    The problem, as denoted by the timeline, is that Bing Image Creator's creation date is dated one second after it was notarized by the external third-party. There are a couple of ways this can happen:
    • The external signer could have supplied the wrong time. In this case, the external signer is DigiCert. DigiCert abides by the X.509 certificate standards and maintains a synchronized clock. If we have to trust anything in this example, then I trust the timestamp from DigiCert.
    • Microsoft intentionally post-dated their creation time. (Seems odd, but it's an option.)
    • Microsoft's server is not using a synchronized clock. As noted in RFC 3628 (sections 4.3, 6.2, 6.3, and 7.3.1d), clocks need to be accurately synchronized. There could be a teeny tiny amount of drift, but certainly not at the tenths-of-a-second scale.
    • Microsoft modified the file after it was notarized. This is the only option that we can immediately rule out. Changing Microsoft's timestamp from "19:34:25" to "19:34:24" causes the cryptographic signature to fail. This becomes a detectable alteration. We can be certain that the signed file said "19:34:25" and not "19:34:24" in the provenance record.
    Now, I know what you're thinking. This might be a one-off case. The X.509 timestamp authority system permits clocks to drift by a tiny fraction of a second. With 0.00001 seconds drift, 24.99999 and 25.00000 seconds can be equivalent. With integer truncation, this could look like 24 vs 25 seconds. However, I'm seeing lots of pictures from Microsoft that contain this same "off by 1 second" error. Here are a few more examples:

    analysis.php?id=b6c2d47883aaf387f5d6ca06f930c2ac390a65d6.208865&fmt=orig&size=256analysis.php?id=972a888c7cbc379965ea550ca2eec5de9acbfc13.142321&fmt=orig&size=256analysis.php?id=855f845588e8766225a676b35dbc868c68d6ceb9.3163177&fmt=orig&size=256

    The Lucy/dog picture is from Bing Image Generator, the apple picture is from Microsoft Designer , and the waffles are from Microsoft's Azure DALL-E service. All of these files have the same "off by 1 second" error. In fact, the majority of pictures that I see from Microsoft have this same error. If I had to venture a guess, I'd say Microsoft's clocks were out of sync by almost a full second.

    Being inaccurate by 1 second usually isn't a big deal. Except in this case, it demonstrates that we cannot trust the embedded C2PA timestamps created by Microsoft. Today it's one second. It may increase over time to two seconds, three seconds, etc.

    Out of Time

    Many of the C2PA-enabled files that I encounter have other timestamps beyond the C2PA metadata. It's problematic when the other timestamps in the file fail to align with the C2PA metadata. Does it mean that the external trusted authority signer is wrong, that the device requesting the signature is inaccurate, that the user's clock is wrong, or that some other timestamp is incorrect? Or maybe a combination?

    As an example, here's a picture that was edited using Adobe's Photoshop and includes an Adobe C2PA signature:

    analysis.php?id=bd642144ebff23e155f19d9e2679ea04f429a1d9.2494659&fmt=orig&size=400

    In this case, the picture includes XMP, IPTC, and EXIF timestamps. Putting them together into a timeline shows metadata alterations after the trusted notary timestamp:

    2022-02-25 12:09:40 GMT EXIF: Date/Time Original
    EXIF: Create Date
    IPTC: Created Date/Time
    IPTC: Digital Creation Date/Time
    XMP: Create Date
    XMP: Date Created
    2023-12-13 17:29:15 GMT XMP: History from Adobe Photoshop 25.2 (Windows)
    2023-12-13 18:22:00 GMT XMP: History from Adobe Photoshop 25.2 (Windows)
    2023-12-13 18:32:53 GMT x.509 Signed Timestamp by the authoritative third-party (DigiCert)
    2023-12-13 18:33:12 GMT EXIF: Modify Date
    XMP: History (Adobe Photoshop 25.2 (Windows))
    XMP: Modify Date
    XMP: Metadata Date
    2023-12-14 03:32:15 GMT XMP: History from Adobe Photoshop Lightroom Classic 12.0 (Windows)
    2024-02-06 14:31:58 GMT FotoForensics: Received

    With this picture:
    1. Adobe's C2PA implementation at Content Credentials doesn't identify any problems. The picture and metadata seem legitimate.
    2. The Adobe-generated signature covers the XMP data. Since the signature is valid, it implies that the XMP data was not altered after it was signed.
    3. The authoritative external timestamp authority (DigiCert) provided a signed timestamp. The only other timeline entry after this signature should be when FotoForensics received the picture.
    4. However, according to the EXIF and XMP metadata, the file was further altered without invalidating the cryptographic signatures or externally supplied timestamp. These modifications are timestamped minutes and hours after they could have happened.
    There are a few ways this mismatched timeline can occur:
    • Option 1: Unauthenticated : As noted by IBM : "Authentication is the process of establishing the identity of a user or system and verifying that the identity is valid." Validity is a critical step in determining authenticity. With this picture, it appears that the XMP metadata was postdated prior to signing by Adobe. This option means that Adobe will happily sign anything and there is no validation or authenticity. (Even though "authenticity" is the "A" in C2P A .)
    • Option 2: Tampered : This option assumes that the file was altered after it was signed and the cryptographic signatures were replaced. In my previous blog entry , I demonstrated how easy it is to replace these C2PA signatures and how the X.509 certificates can have forged attribution.

      At Hintfo, I use the GnuTLS's " certtool " to validate the certificates.

      • To view the certificate information, use: c2patool --certs file.jpg | certtool -i
      • To check the certificate information, use: c2patool --certs file.jpg | certtool --verify-profile=high --verify-chain
      • To verify the digital signatures, use: c2patool -d file.jpg

      Although the digital signatures in this car picture appear valid, certtool reports a warning for Adobe's certificate:

      Not verified. The certificate is NOT trusted. The certificate issuer is unknown.

      In contrast to Adobe, the certs from Microsoft, OpenAI, Stability AI, and Leica don't have this problem. Because the certificate is unauthenticated, only Adobe can confirm if the public cert is really theirs. I'm not Adobe; I cannot validate their certificate.

      I also can't validate the DigiCert certificate because Adobe's c2patool doesn't extract this cert for external validation. It is technically feasible for someone to replace both Adobe's and DigiCert's certificates with forgeries.
    Of these two options, I'm pretty certain it's the first one: C2PA doesn't authenticate and Adobe's software can be used to sign anything.

    With this car example, I don't think this user was intentionally trying to create a forgery. But an "unintentional undetected alteration" actually makes the situation worse! An intentional forgery could be trivially accepted as legitimate.

    It's relatively easy to detect when the clock appears to be running fast, postdating times, and listing events after they could have happened. However, if the clocks were slow and backdating timestamps, then it might go unnoticed. In effect, we know that we can't trust postdated timestamps. But even if it isn't postdated, we cannot trust that a timestamp wasn't backdated.

    Time After Time

    This red car picture is not a one-off special case. Here are other examples of mismatched timestamps that are signed by Adobe:

    analysis.php?id=55dbfae053796ff64c06007f23203fd8c0fb2fd9.1072366&fmt=orig&size=256
    The timeline from this cheerleader picture shows that the EXIF and XMP were altered 48 seconds after it was cryptographically signed and notarized by DigiCert. Adobe's Content Credentials doesn't notice any problems.

    analysis.php?id=98d5f35e83fab14668eb2d1a807c0816331ed235.8739499&fmt=orig&size=256
    This photo of lights was notarized by DigiCert over a minute before the last alteration. Again, Adobe's Content Credentials doesn't notice any problems.

    analysis.php?id=ca46ebd10a83a7d62ff9c78e4b9420d5f5559588.497461&fmt=orig&size=256
    This picture has XMP entries that postdate the DigiCert notarized signature by 3 hours. And again, Adobe's Content Credentials finds no problems.

    Unfortunately, I cannot include examples received at FotoForensics that show longer postdated intervals (some by days) because they are associated with personal information. These include fake identity cards, medical records, and legal documents. It appears that organized criminal groups are already taking advantage of this C2PA limitation by generating intentional forgeries with critical timestamp requirements.

    Timing is Everything

    Timestamps identify when files were created and updated. Inconsistent timestamps often indicate alterations or tampering. In previous blog entries , I demonstrated how metadata can be altered and signatures can be forged. In this blog entry, I've shown that we can't even trust the timestamps provided by C2PA steering committee members. Microsoft uses unsynchronized clocks, so we can't be sure when something was created, and Adobe will happily sign anything as if it were legitimate.

    In my previous conversations with C2PA management, we got into serious discussions about what data can and cannot be trusted. One of the C2PA leaders lamented that "you have to trust something." Even with a zero-trust model, you must trust your computer or the validation software. However, C2PA requires users to trust everything . There's a big difference between trusting something and trusting everything . For example:

    Trust Area C2PA Requirements Forgeries Real World
    Metadata C2PA trusts that the EXIF, IPTC, XMP, and other types of metadata accurately reflects the content. A forgery can easily supply false information without being detected. Adobe's products can be trivially convinced to authentically sign false metadata as if it were legitimate. In real world examples, we have seen Microsoft provide false timestamps and Adobe generate valid cryptographic signatures for altered metadata.
    Prior claims C2PA trusts that each new signer verified the previous claims. However, C2PA does not require validation before signing. Forgeries can alter metadata and "authentically" sign false claims. The signatures will be valid under C2PA. The altered metadata examples in this blog entry shows that Adobe will sign anything.
    Signing Certificates C2PA trusts that the cryptographic certificate (cert) was issued by an authoritative source. However, validation is not required. A forgery can create a cert with false attribution. In my previous blog entry , I quoted where the C2PA specification explicitly permits revoked and expired certificates. I also demonstrated how to backdate an expired certificate. As noted by certtool, Adobe's real certificates are not verifiable outside of Adobe.
    Tools Evaluating C2PA metadata requires tools. We trust that the tools provided by C2PA work properly. The back-end C2PA library displays whatever information is in the C2PA metadata. Forged information in the C2PA metadata will be displayed as valid by c2patool and the Content Credentials web site. Both c2patool and Content Credentials omit provenance information that identifies the timestamp authority. Both systems also misassociate the third-party timestamp with the first-party data signature.
    Timestamps C2PA treats timestamps like any other kind of metadata; it trusts that the information is valid. A forgery can easily alter timestamps. In real world examples, we have seen misleading timestamps due to clock drift and other factors.

    The entire architecture of C2PA is a house-of-cards based on 'trust'. It does nothing to prevent malicious actors from falsely attributing an author to some media, claiming ownership over someone else's media, or manufacturing fraudulent content for use as fake news, propaganda, or other nefarious purposes. At best, C2PA gives a false impression of authenticity that is based on the assumption that nobody has ill intent.

    Ironically, the only part of C2PA that seems trustworthy is the third-party timestamp authority's signed timestamp. (I trust that companies like DigiCert are notarizing the date correctly and I can test it by submitting my own signatures for signing.) Unfortunately, the C2PA specification says that using a timestamp authority is optional .

    Recently Google and Meta pledged support for the C2PA specification. Google even became a steering committee member. I've previously spoken to employees associated with both companies. I don't think this decision was because they believe in the technology. (Neither company has deployed C2PA's solution yet.) Rather, I suspect that it was strictly a management decision based on peer pressure. I don't expect their memberships to increase C2PA's reliability and I doubt they can improve the C2PA solution without a complete bottom-to-top rewrite. The only real benefit right now is that they increase the scope of the class action lawsuit when someone eventually gets burned by C2PA. Now that's great timing!

    Značky: #Authentication, #FotoForensics, #Network, #Forensics, #Programming

    • Ha chevron_right

      Catching Flies with Honey

      pubsub.slavino.sk / hackerfactor · Sunday, 25 February - 18:15 edit · 13 minutes

    Recently, the buzz around security risks has recently focused on AI: AI telemarketing scams , deepfake real-time video impersonations , ChatGPT phishing scams, etc. However, traditional network attacks haven't suddenly vanished. My honeypot servers have been seeing an increase in scans and attacks, particularly from China.

    Homemade Solutions

    I've built most of my honeypot servers from scratch. While there are downloadable servers, most of the github repositories haven't been updated in years. Are they no longer maintained, or just continuing to work well? Since I don't know, I don't bother with them.

    What I usually do is start with an existing stable server and then modify it into a honeypot. For example, I run a Secure Shell server (sshd) that captures brute-force login attempts. Based on the collected data, I can evaluate information about the attackers.

    Securing Secure Shell

    Secure Shell (ssh) is a cornerstone technology used by almost every server administrator. Every modern operating system, including MacOS, Linux, and BSD, includes an ssh client by default for accessing remote systems. For Windows, most technical people use PuTTY as an ssh client.

    Because it's ubiquitous, attackers often look for servers running the Secure Shell server (sshd). When they find it, they can be relentless in their brute-force hacking attempts. They will try every combination of username and password until they find a working account.

    If you have an internet-accessible sshd port (default: 22/tcp) and look at your sshd logs (location in OS-specific; try /var/log/system.log or /var/log/auth.log), then you should see tons of brute-force login attempts. These will appear as login failures and might list the username that failed.

    The old common wisdom was to move sshd to a non-standard port, like moving it from 22/tcp to 2222/tcp. The belief was that attackers only look for standard ports. However, the attackers and scanners have become smarter. They now scan for every open port. When they find one, they start to query the service. And when (not if) they find your non-standard port for sshd, they will immediately try brute forcing logins. Sure, they probably can't get in. But that doesn't stop them from trying continually for years.

    These days, I've found that combining sshd with a knock-knock daemon is an ideal solution (sudo apt install knockd). Knockd watches for someone to connect to a few ports (port knocking), even if nothing is running on those ports. If knockd sees the correct knocking pattern, then it opens up the the desired port only for the client who knocked. For example, your /etc/knockd.conf might look like:
    [options]
    UseSyslog
    Interface = eth0

    [ssh]
    sequence = 1234,2345,3456
    seq_timeout = 5
    start_command = /sbin/iptables -A INPUT -s %IP% -p tcp --dport 22 -j ACCEPT
    cmd_timeout = 60
    stop_command = /sbin/iptables -D INPUT -s %IP% -p tcp --dport 22 -j ACCEPT
    tcpflags = syn

    This tells knockd to watch for someone trying to connect to ports 1234/tcp, 2345/tcp, and 3456/tcp in that specific order . The client has five seconds to complete the knocking pattern. If they do it, then port 22/tcp (sshd) will be opened up for the client. It will only be open for 60 seconds, so the client has one minute to connect. (Also be sure to configure ufw to deny access to 22/tcp from the general public!)

    After you connect, the knocking port is closed down. Your existing connection will continue to work, but any new logins will require you to repeat the knocking sequence.

    An attacker who is port scanning will never find the sshd port because it's hidden and they don't know the secret knock pattern.

    For myself, I created an alias for ssh:

    alias kssh="knock -d 500 $1 1234 2345 3456 ; ssh $*"


    This alias says to do the port knocking with a half-second delay (500ms) between each knock, and then run the ssh client. The small delay is because sequential packets may take different network routes and arrive out-of-order; the delay helps them arrive in the correct order.

    Since I deployed port knocking on my production servers, I've had zero scanners and attackers find my sshd. I don't see any brute-force login attempts.

    The Inside View

    I'm always hesitant to explicitly say how I've built or secured my own servers. I don't want to give the attackers any detailed insight. But in this case, knowing how I've hardened my own systems doesn't help the attackers. If they scan my server and find no sshd port, it could mean:
    • I'm not running an external sshd server on this network address.
    • I'm running it, but on a non-standard port.
    • I'm running it, but it requires a special knocking sequence to unlock it, and they don't know the sequence. (65,536 possible ports means a three-port knock sequence means over 2×10 14 possible combinations. And that's assuming that I'm using 3 ports; if I use 5 or more ports then it's practically impossible.)
    • Maybe I'm using both knockd and a non-standard port! Even if they find the knock sequence, they only have seconds to find the port. (I don't have to permit the port for 60 seconds; I could drop it down to 10 seconds to really narrow the window of opportunity.)
    • Assuming they can find the knock sequence and access the sshd port, then they still have to contend with trying to crack sshd, which is probably the most secure software on the internet. Will brute-force password guessing work? Or do I require a pre-shared key for login access? And every time they fail, they need to repeat the knock sequence, which adds in a lot of delay.
    On top of this, the act of scanning my servers for open ports is guaranteed to trigger a hostile scanner alert that will block them from accessing any services on my system.

    Not only do I feel safe telling people how I do this, I think everyone should do this!

    BYOHD (Build Your Own Honeypot Daemon)

    While it's usually desirable to hide sshd from attackers on production servers, a honeypot shouldn't try to hide. Turning a Secure Shell server into a honeypot server requires a little code change to sshd in order to enable more logging.

    Keep in mind, there are honeypot sshd daemons that you can download, but they are usually unmaintained. OpenSSH is battle tested, hardened, and maintained. Turning it into a honeypot means I don't need to worry about possible vulnerabilities in old source code.
    1. Since we're going to be logging every password attempt, we don't want to log your administrative login. You need to configure your sshd to permit logins using certificates based on pre-shared keys (PSK) and not passwords . This allows you (the administrator) to login without a password; you just need the PSK. There are plenty of online tutorials for generating the public/private key pair and configuring your sshd to support PSK-only logins. The main changes that you need in /etc/ssh/sshd_config are:

      ChallengeResponseAuthentication no
      PasswordAuthentication no
      UsePAM no
      PermitRootLogin no

      These changes ensure that you cannot login with a password; you must use the pre-shared keys.
    2. Honeypots generate lots of logs. I moved sshd's logs into a separate file. I redirected sshd logging by creating /etc/rsyslog.d/20-sshd.conf:
      template(name="sshdlog_list" type="list") {
      property(name="timereported" dateFormat="year")
      constant(value="-")
      property(name="timereported" dateFormat="month")
      constant(value="-")
      property(name="timereported" dateFormat="day")
      constant(value=" ")
      property(name="timereported" dateFormat="hour")
      constant(value=":")
      property(name="timereported" dateFormat="minute")
      constant(value=":")
      property(name="timereported" dateFormat="second")
      constant(value=" ")
      property(name="hostname")
      constant(value=" ")
      property(name="app-name")
      constant(value=":")
      property(name="msg" spifno1stsp="on" ) # add space if $msg doesn't start with one
      property(name="msg" droplastlf="on" ) # remove trailing \n from $msg if there is one
      constant(value="\n")
      }

      if $programname == 'sshd' then /var/log/sshd.log;sshdlog_list
      & stop
      Then I updated the log rotation by creating /etc/logrotate.d/sshd:
      /var/log/sshd.log
      {
      rotate 7
      weekly
      missingok
      notifempty
      compress
      delaycompress
      create 0644 syslog adm
      postrotate
      /usr/lib/rsyslog/rsyslog-rotate
      endscript
      }
      Finally, restart the logging: sudo service rsyslog restart.
    3. The easiest way to turn a supported server into a honeypot is to modify the source code. In this case, I download the source for OpenSSH and patch it to log every login attempt. Since I deploy this often, I ended up writing a script to automate this part:
      #!/bin/bash
      # For the honeypot: Create an openssh that logs passwords
      mkdir tmp
      cd tmp
      apt-get source openssh
      cd openssh-*
      patch << EOF
      --- auth-passwd.c 2020-02-13 17:40:54.000000000 -0700
      +++ auth-passwd.c 2023-02-25 10:31:53.946913899 -0700
      @@ -84,14 +84,20 @@
      #endif

      if (strlen(password) > MAX_PASSWORD_LEN)
      + {
      + logit("Failed login by host '%s' port '%d' username '%.100s', password '%.100s' (truncated)", ssh_remote_ipaddr(ssh), ssh_remote_port(ssh), authctxt->user, password);
      return 0;
      + }

      #ifndef HAVE_CYGWIN
      if (pw->pw_uid == 0 && options.permit_root_login != PERMIT_YES)
      ok = 0;
      #endif
      if (*password == '\0' && options.permit_empty_passwd == 0)
      + {
      + logit("Failed login by host '%s' port '%d' username '%.100s', password '' (empty)", ssh_remote_ipaddr(ssh), ssh_remote_port(ssh), authctxt->user);
      return 0;
      + }

      #ifdef KRB5
      if (options.kerberos_authentication == 1) {
      @@ -113,7 +119,12 @@
      #endif
      #ifdef USE_PAM
      if (options.use_pam)
      - return (sshpam_auth_passwd(authctxt, password) && ok);
      + {
      + /* Only log failed passwords */
      + result = sshpam_auth_passwd(authctxt, password);
      + if (!result) { logit("Failed login by host '%s' port '%d' username '%.100s', password '%.100s'", ssh_remote_ipaddr(ssh), ssh_remote_port(ssh), authctxt->user, password); }
      + return (result && ok);
      + }
      #endif
      #if defined(USE_SHADOW) && defined(HAS_SHADOW_EXPIRE)
      if (!expire_checked) {
      @@ -123,6 +134,8 @@
      }
      #endif
      result = sys_auth_passwd(ssh, password);
      + /* Only log failed passwords */
      + if (!result) { logit("Failed login by host '%s' port '%d' username '%.100s', password '%.100s'", ssh_remote_ipaddr(ssh), ssh_remote_port(ssh), authctxt->user, password); }
      if (authctxt->force_pwchange)
      auth_restrict_session(ssh);
      return (result && ok);
      @@ -199,7 +212,10 @@
      char *pw_password = authctxt->valid ? shadow_pw(pw) : pw->pw_passwd;

      if (pw_password == NULL)
      + {
      + logit("Failed login by host '%s' port '%d' username '%.100s', password '' (empty)", ssh_remote_ipaddr(ssh), ssh_remote_port(ssh), authctxt->user);
      return 0;
      + }

      /* Check for users with no password. */
      if (strcmp(pw_password, "") == 0 && strcmp(password, "") == 0)
      @@ -217,7 +233,9 @@
      * Authentication is accepted if the encrypted passwords
      * are identical.
      */
      - return encrypted_password != NULL &&
      - strcmp(encrypted_password, pw_password) == 0;
      + int result=0;
      + if (encrypted_password != NULL) { result = strcmp(encrypted_password, pw_password); }
      + if (!result) { logit("Failed login by host '%s' port '%d' username '%.100s', password '%.100s'", ssh_remote_ipaddr(ssh), ssh_remote_port(ssh), authctxt->user, password); }
      + return ((encrypted_password != NULL) && (result == 0));
      }
      #endif
      EOF
      autoreconf && ./configure --with-pam --with-systemd --sysconfdir=/etc/ssh && make clean && make -j 3
      These patches are inserted everywhere a password is checked. They log the host, port, username, and attempted password.
    4. Finally, tell the server to run this sshd instead of the system one. (sudo install sshd /usr/bin/sshd ; sudo service sshd restart)
    If your public servers are like mine, you'll start seeing entries in /var/log/sshd.log very quickly (under a minute). They might look like:
    2024-02-24 13:48:59 sshd: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.92.0.22  user=root
    2024-02-24 13:49:02 sshd: Failed login by host '218.92.0.22' port '58463' username 'root', password 'toor123'
    2024-02-24 13:49:02 sshd: Failed password for root from 218.92.0.22 port 58463 ssh2
    2024-02-24 13:49:03 sshd: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=43.153.207.98 user=root
    2024-02-24 13:49:05 sshd: Failed login by host '43.153.207.98' port '55874' username 'root', password 'qweASDqwe'
    2024-02-24 13:49:05 sshd: Failed password for root from 43.153.207.98 port 55874 ssh2
    2024-02-24 13:49:05 sshd: Failed login by host '218.92.0.22' port '58463' username 'root', password 'asdasd123'
    2024-02-24 13:49:05 sshd: Failed password for root from 218.92.0.22 port 58463 ssh2
    2024-02-24 13:49:06 sshd: Received disconnect from 43.153.207.98 port 55874:11: Bye Bye [preauth]
    2024-02-24 13:49:06 sshd: Disconnected from authenticating user root 43.153.207.98 port 55874 [preauth]
    2024-02-24 13:49:09 sshd: Failed login by host '218.92.0.22' port '58463' username 'root', password '456852'
    2024-02-24 13:49:09 sshd: Failed password for root from 218.92.0.22 port 58463 ssh2
    2024-02-24 13:49:10 sshd: Received disconnect from 218.92.0.22 port 58463:11: [preauth]
    2024-02-24 13:49:10 sshd: Disconnected from authenticating user root 218.92.0.22 port 58463 [preauth]
    2024-02-24 13:49:10 sshd: PAM 2 more authentication failures; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.92.0.22 user=root
    2024-02-24 13:49:13 sshd: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=43.134.111.125 user=root
    2024-02-24 13:49:15 sshd: Failed login by host '43.134.111.125' port '40806' username 'root', password 'P@ssw0rdd'
    2024-02-24 13:49:15 sshd: Failed password for root from 43.134.111.125 port 40806 ssh2
    2024-02-24 13:49:16 sshd: Received disconnect from 43.134.111.125 port 40806:11: Bye Bye [preauth]
    2024-02-24 13:49:16 sshd: Disconnected from authenticating user root 43.134.111.125 port 40806 [preauth]
    Now I have detailed logs about every brute-force login attempt.

    Gathering statistics All of my honeypot tracking logs contain the string "Failed login by host". I can filter those lines to detect brute-forced login attacks. From the sample log above:

    2024-02-24 13:49:02 sshd: Failed login by host '218.92.0.22' port '58463' username 'root', password 'toor123'
    2024-02-24 13:49:05 sshd: Failed login by host '43.153.207.98' port '55874' username 'root', password 'qweASDqwe'
    2024-02-24 13:49:05 sshd: Failed login by host '218.92.0.22' port '58463' username 'root', password 'asdasd123'
    2024-02-24 13:49:09 sshd: Failed login by host '218.92.0.22' port '58463' username 'root', password '456852'
    2024-02-24 13:49:15 sshd: Failed login by host '43.134.111.125' port '40806' username 'root', password 'P@ssw0rdd'
    (Yes, that's four attacks in under a minute by three different IP addresses! Any that's typical.)

    After a few days, you can start creating histograms related to who attacks the most (IP address), what accounts are attacked the most (username), and what passwords are tried the most. For the last 7 days, my own honeypot has seen 4,934 unique brute force usernames and 19,453 unique brute force passwords from 2,308 unique IP addresses. The vast majority of attacks (56%) are from China, with Singapore coming in at a distant second with 7%, and the United States rounding out third at 5%.

    The top 10 usernames account for 76% of all brute-force login attempts:
    # Sightings % Username
    1 42,732 69.55% root
    2 1,011 1.65% ubuntu
    3 918 1.49% admin
    4 622 1.01% user
    5 593 0.97% test
    6 339 0.55% oracle
    7 304 0.49% ftpuser
    8 304 0.49% postgres
    9 175 0.28% test1
    10 156 0.25% git

    In contrast, the top 10 passwords only account for about 10% of all password guesses:
    # Sightings % Password
    1 2,869 4.67% 123456
    2 851 1.39% 123
    3 321 0.52% 1234
    4 308 0.50% 1
    5 308 0.50% password
    6 293 0.48% 12345
    7 278 0.45% test
    8 274 0.45% admin
    9 268 0.44% root
    10 243 0.40% 111111
    (Don't use user "root" with password "123456" unless you want to be compromised in under an hour.)

    I ran similar login metrics last year . The usernames list is almost the same; only 'debian' dropped out while 'test1' came in. Similarly, password '12345678' swapped positions with the '111111' (the previous #11). By volume, the number of attacks has nearly tripled since last year.

    It's not just my sshd honeypot that has seen this increase in volume. All of my honeypot servers have seen similar increases in volume and mostly from China. A few days ago, the FBI Director warned of an ‘Unprecedented Increase’ in Chinese cyberattacks on US infrastructure . This definitely matches my own observations. They're not just attacking banks and power grids and telecommunications; they are attacking everyone. Even if your server isn't "critical infrastructure" or contains sensitive customer information, it can still be compromised and used to attack other systems. With the upcoming U.S. election and extreme unrest in Europe and the Middle East, it's time to batten down the cyber hatches. If you're not tracking attacks against your own servers and taking steps to mitigate attacks, then it's time to start. (Not sure where to begin? Make a beeline to my series on No-NOC Networking : simple steps to stop attacks before they happen.)

    Značky: #Security, #Honeypot, #Programming, #Network

    • Ha chevron_right

      The Jitter Bug

      pubsub.slavino.sk / hackerfactor · Thursday, 15 February - 20:48 edit · 11 minutes

    I recently attended a presentation about an online "how to program" system. Due to Chatham House Rules , I'm not going to name the organization, speaker, or programming system. What I will say: as an old programmer, I often forget how entertaining it can be to watch a new programmer try to debug code during a live demonstration. (My Gawd, the presenter needs to go into comedy. The colorful phrases -- without swearing -- were priceless.) I totally understand the frustration. And while I did see many of the bugs (often before the presenter hit 'Enter'), the purpose was to watch how this new system helps you learn how to solve problems.

    At the end of the 45-minute presentation, it was revealed that this was the culmination of over two months of learning effort. But honestly, having seen the workflow and thought process, I think the speaker is on track to becoming an excellent software guru. At this point, the methodology is known and it just takes experience to improve.

    It's a feature! Ship it!

    As someone who works with computers every day, I know that tracking down bugs can be really hard. In my opinion, there are four basic difficulty levels when debugging any system:
    • Level 1: Easy . Sometimes you get lucky. Maybe the system generates an informative error message. Programs sometimes alert you to a bad configuration file, missing parameters, or incorrect usage. Compilers often identify the line number with an issue. (And sometimes it's the correct line number!) Other times there might be helpful log messages that tell you about the problem.
    • Level 2: Medium . More often, the error messages and logs provide hints and clues. It's up to you to figure out where the error is coming from, what is causing the error, and how to fix it. Because of the familiarity, problems in your own code are usually easier to debug compared to problems in someone else's code. In the worst-case, you might end up consulting online manuals (man pages), documentation, or even diving into source code. Blind debugging, when you have no code or documents, is much more difficult.
    • Level 3: Frustrating . The hardest problems to resolve are when bugs appear inconsistently. Sometimes it fails and sometimes it works. These are much more difficult to track down. I hate those bugs that appear to vanish when you put in debugging code, but that resurface the instant the debugging code is disabled. Or that work fine under a debugger, like gdb or valgrind, but consistently fail without the debugger. (Those are almost always due to dynamic library issues or memory leaks, and the failure often surfaces long after the problem started.)
    • Level 4: Soul-Crushing . The worst-case scenarios are the ones that appear to happen randomly and leave no logs about the cause. Any initial debugging is really just a blind guess in the dark.
    I've been battling with one of those worst-case scenarios for nearly 2 years -- and I finally got it solved. (I think?)

    Reboot is Needed

    I have a handful of servers in a rack. Each piece of hardware has plenty RAM, CPUs, and disk space. But rather than running each as big computer with tons of CPU power and memory, I've subdivided the resources into a handful of virtual machines. I may allocate 2 CPUs and 1 Gig of RAM to my mail server, and 6 CPUs with more RAM to FotoForensics. The specific resource allocations is configurable based on the VM's requirements.

    For my servers, the hypervisor (parent of the virtual machines, dom0) uses Xen. Xen is a very common virtualization environment. Each piece of hardware has a dom0 and runs a group of virtual machines (VMs, or generically called domu).

    analysis.php?id=4328d5ada9412dfecaa7266e98531400f65454e2.627738&fmt=orig&size=400

    The problem I was having: occasionally a CPU on one VM would hang. The problem seemed to jump around between VMs and didn't appear regularly. The error in the VM's kernel.log looked like:

    kernel: [2333839.516291] RIP: e030:zap_pte_range.isra.0+0x168/0x860
    kernel: [2333839.516298] Code: 00 10 00 00 e8 c9 f6 ff ff 49 89 c0 48 85 c0 74 0b 48 83 7d 88 00 0f 85 0a 06 00 00 41 f6 47 20 01 0f 84 d7 02 00 00 4c 8b 23 <48> c7 03 00 00 00 00 4d 39 6f 10 4c 89 e8 49 0f 46 47 10 4d 39 77
    kernel: [2333839.516316] RSP: e02b:ffffc9004114ba60 EFLAGS: 00010202
    kernel: [2333839.516322] RAX: ffffea000192fe40 RBX: ffff88807854e760 RCX: 0000000000000125
    kernel: [2333839.516330] RDX: 0000000000000000 RSI: 00007f2410aec000 RDI: 00000000135f9125
    kernel: [2333839.516339] RBP: ffffc9004114bb10 R08: ffffea000192fe40 R09: 0000000000000000
    kernel: [2333839.516347] R10: 0000000000000001 R11: 000000000000073f R12: 00000000135f9125
    kernel: [2333839.516355] R13: 00007f2410aec000 R14: 00007f2410aed000 R15: ffffc9004114bc48
    kernel: [2333839.516371] FS: 00007f2410bf1580(0000) GS:ffff88807d500000(0000) knlGS:0000000000000000
    kernel: [2333839.516380] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033
    kernel: [2333839.516387] CR2: 00007f2410ae1151 CR3: 0000000002a0a000 CR4: 0000000000040660
    kernel: [2333839.516398] Fixing recursive fault but reboot is needed!

    In this case, the log shows that the CPU had a failure. However, it doesn't identify what caused the failure. Was it a hardware problem? Some bad software? Or something else? There's also no information about how to fix it other than "reboot is needed!"

    When this error happens, the bad CPU would be pinned at 100% usage and not doing anything. If the VM had 2 CPUs, then it would limp along with one good CPU until the VM was rebooted. However, sometimes one CPU would die and then the other would die (sometimes minutes apart, sometimes days). At that point, the entire VM would be dead and require a reboot. The last failure never makes it to the logs.

    I started calling this problem the 'jitter bug' because it happened irregularly and infrequently. There was some unidentified event happening that was capable of hanging a CPU at random.

    Debugging a Bad Bug

    The jitter bug was limited to the VMs. I never saw it on dom0, dom0 never experienced this kind of crash, and dom0 had no logs related to any of these domu CPU failures. When a VM failed, I could use dom0 to destroy the instance and recreate the VM. The new VM would start up without a problem. Whatever was locking the CPU was limited in scope to the VM.

    I searched Google for "Fixing recursive fault but reboot is needed". It currently has over 3,000 results, so I'm definitely not the only person seeing this problem. For me, the problem was irregular but it would happen at least every few weeks on at least one randomly chosen VM across all of my hardware servers. Other people reported the problem happening daily, or every few days. I also noticed a commonality: in almost every reported instance, they were using a virtualized server.

    This is when I went through the long debugging process, trying to catch a server crash that happens intermittently, and often weeks apart. I ended up writing a ton of monitoring tools that watch everything from packets to processes. All of this effort was really to debug the cause of this CPU hang. What I found:
    • Hardware failure . I ruled this out. I had four different hardware servers acting the exact same way. The chances of having the exact same hardware failure appear on four different servers (different ages, different models) was extremely unlikely.
    • My custom settings . Different VMs running different software and with different configurations were experiencing the same problem. Also, other people in various forums were reporting the same problem, and they were not using my software or settings. I could rule out my own software and customizations.
    • Packet of Doom ™. I was concerned that someone might have found a new " ping of death " or some other kind of killer packet. I configured a few boxes that would capture every packet sent to and from the VMs. (I rotated the logs hourly.) I did catch every packet around two different crashes. Nothing unusual, so I ruled out a networking issue.
    • Kernel patch . A few forums suggesting applying a kernel patch or upgrading Xen. I tried that on a test system, but it had no impact. The problem still happened.
    • Operating system . The domu virtual machines don't need to run the same operating system as the parent dom0. On my test system, I installed a different OS . It took a month, but it crashed the same way. This means that the problem is independent of the guest VM operating system.
    • Blocking issue . One of the Xen forums, a person from Amazon suggested that it might be a block device deadlock situation. The patch is to disable underlying block device merges. They didn't say where to apply this, so I put it in both dom0 and every domu. While this is probably overkill, it did result in a change!

      1. The CPU failures happened less often. (Almost every 3 weeks instead of roughly every 2 weeks.)
      2. When they happened, they usually didn't hang the CPU or require a reboot. The system usually recovered. In kernel.log, I'd see a similar CPU failure trace, but it rarely had the 'reboot needed' message and the CPU wasn't hung. (Having a CPU report an error is really bad, but it's a huge improvement over a hung VM.)

      Unfortunately, I was still seeing an occasional hang with a reboot requirement.
    With the block device workaround, I didn't notice any performance problem and the hangs happened much less often.

    Backtrace

    Each of these different tests took weeks to perform. This is why it's taken me years to find a solution. Thinking back on it, I have been battling with this problem for almost as long as I've had the servers in the new rack.

    Wait... the new rack? The new location?

    When I moved all of my servers out of my former hosting location (they went out of business ), I reinstalled the OS on each server. The previous OS was old and losing vendor support, so I needed to upgrade. Upgrading during the move seemed like a good idea at the time. Looking over every other reported instance of this error, I noticed that each sighting was related to a newer operating system. This looked like some kind of incompatibility between Xen and the underlying OS -- either Ubuntu or Debian.

    I ran a test and installed the really old OS on my spare server: Ubuntu 16.04 LTS, from 2016. Yup, no instance of the problem, even though the same hardware running Ubuntu 20.04 LTS had the bug. (It took me two months to confirm this since the problem is irregular.) Unfortunately, rolling back the OS on my production servers is a no-go. I needed a fix for a supported OS.

    Bingo! (Maybe?)

    This got me thinking. The problem never appeared on dom0. But what if dom0 was the cause? And what if it was caused by something found in the newer OS versions that didn't previously exist?

    Buried in the logs of dom0 was an update process that ran every few hours. It's called fwupd, the firmware update daemon. According to their github repository , "This project aims to make updating firmware on Linux automatic, safe, and reliable." Ubuntu appears to have incorporated it into 18.04 LTS (circa 2018), which is the same time people began reporting this CPU hang problem. Every version of Ubuntu since then has included this process.

    To see if your system is using it, try this command: sudo systemctl list-timers | grep fwupd
    You should see a line that says when it will run next and when it last ran:

    $ sudo systemctl list-timers | grep fwupd
    Thu 2024-02-15 09:43:37 MST 18min left Thu 2024-02-15 05:37:05 MST 3h 47min ago fwupd-refresh.timer fwupd-refresh.service


    On my system, /usr/lib/systemd/system/fwupd-refresh.timer says to run the process twice a day, with a random delay of up to 12 hours. This explains why the crashes happened at random times:

    Description=Refresh fwupd metadata regularly
    ConditionVirtualization=!container

    [Timer]
    OnCalendar=*-*-* 6,18:00
    RandomizedDelaySec=12h
    Persistent=true

    [Install]
    WantedBy=timers.target


    When fwupd runs, it queries the existing firmware and then checks if it needs to apply any updates. The act of querying the firmware from Xen's dom0 can hang VMs. As a test, I repeatedly called "fwupdmgr get-devices" and eventually forced a CPU hang on domu. The hang isn't always immediate; I've clocked it as happening as much as 10 minutes after the process ran! This delayed failure is why I wasn't able to associate the hang with any specific application; the crash wasn't immediate. It also appears to be a race condition: on my servers, it's about a 1 in 50 chance of a hang, which explains why usually I saw any given CPU hang at least monthly. I'm sure the odds of a hang vary based on your hardware, which would explain why some people see this same problem more often.

    I disabled this daemon last month. (It's really unnecessary.)

    sudo systemctl stop fwupd fwupd-refresh fwupd-refresh.timer
    sudo systemctl disable fwupd fwupd-refresh fwupd-refresh.timer
    sudo systemctl mask fwupd fwupd-refresh fwupd-refresh.timer


    These three commands are basically (1) stop running, (2) never run, and (3) don't allow any other process to make it run.

    Poof! It's been a month and a half since I last saw any CPU failures on any of my servers. While this isn't proof of a fix, it does give me a high sense of confidence. Rather than doing a monthly reboot "just in case" it fixes the problem, I'm going to try to go back to rebooting only when the kernel is upgraded due to a security patch. (I like having stable systems with uptimes that are measured in months or years.)

    Debugging computer problems can vary from simple typos to complex interactions. In this case, I think it's the combination of Xen, fwupd, and the hardware that causes a random timing error, race condition, and a hardware hang. I wish I had some colorful description for this problem that didn't involve swearing.

    Značky: #Network, #Security, #Programming

    • Ha chevron_right

      12 Years at FotoForensics

      pubsub.slavino.sk / hackerfactor · Friday, 9 February - 20:07 edit · 7 minutes

    Today, my FotoForensics service turns 12 years old! It has over 6,800,000 unique pictures. That's up about 800,000 from last year . The site's popularity has increased by about 12%, from being in the top 80,000 of internet destinations last year to about 70,000 right now. (By comparison, the top five internet destinations are currently Google, Facebook, Youtube, [surprisingly] Twitter, and Instagram. None of my services are that popular, yet.)

    Unexpectedly Popular

    I actually find the site's popularity kind of surprising, considering all of the wide-area bans. For example:

    Banned: Tor
    I've been actively blocking uploads from the Tor network since 2017 (7 years and counting). This is because over 90% of uploads from Tor violated my terms of service. Besides the IDs (for fraud) and bulk uploading, Tor users account for over 50% of the uploaded child porn (yuck). Initially, the bans were almost set manually, but I quickly added in automated rules. Today, the detection of Tor uploads is completely automated. While I see the "Tor blocked" messages in the logs, I don't give it a second thought.

    Banned: Russia
    Before being banned, Russia was almost abusive as Tor. Bulk uploading, porn, child porn, fake IDs, drug distribution photos, and much more . It wasn't any single user or group; it was wide-spread and endemic. When I'd catch them, I'd ban them. However, the Russian mentality when getting caught is to evade the ban and then increase in volume. By 2020, I'd had enough of this and issued a country-wide ban.

    To put this ban into perspective, the public FotoForensics site does not permit uploads of ID, drivers licenses, passports, etc. I've been keeping metrics on these types of uploads:

    analysis.php?id=c57782f1cd3fb939f43a0ac52869ec2a882c88b5.136939&fmt=orig

    Black and white means I've never received any ID pictures from that country. (E.g., You'll never see a Greenland passport because it's part of Denmark .) The volume per country goes from bright green (few sightings) to bright red (the most sightings). Even with zero uploads from Russia for the last 4 years, Russia still has more ID uploads than any other country. Russia was so prolific that four year after banning their uploads, Russia still has uploaded more IDs than the next three countries combined (China, Syria, and Indonesia).

    Since putting the country-wide ban in place, I've received many unban requests from Russians, including:
    • People claiming to be Russian law enforcement. As proof of their legitimacy, some provided photos of fake police IDs and fake police badges. Others gave fake names and addresses for police stations that don't exist. None of these requests appear to be real law enforcement.
    • A few Russian service providers have asked to be unbanned. However, they were part of the original problem and have done nothing to mitigate the abuse that emanates from their networks. One was rather chatty and made it clear: they should not be held responsible for abuses from their customers, even if the problem comes from a majority of them.
    • Two people claiming to be with Russian media asked to be unbanned. However, I couldn't authenticate either of them and they were coming from shared networks where people (let's assume other people and not them) were uploading prohibited content.
    When banned, these users would occasionally switch to VPNs or proxy networks and continue their uploads. These abuses would be quickly caught and all uploads from the proxy network would be blocked. Today, it is very difficult for anyone in Russia to upload anything to FotoForensics. And if they find a way to upload and violate my terms of service (porn, IDs, etc.), then they get blocked fast.

    Because of the country-wide ban, I've also received a lot of hate email from Russians. They call me a tool of the West, a 'liberal', or pro-Ukrainian propagandist. My thinking here is pretty simple: their country was banned for widespread abuses. Writing hostile messages to me don't alleviate the abuse.

    Banned: China
    I really went out of my way to not ban all of China, but eventually I had no other option.
    • The first country-wide ban lasted a few days. The ban message even explicitly mentioned why there was a country-wide ban and included sample images. When the ban lifted, the abuses resumed a few days later.
    • The second country-wide ban lasted a week. Again, it resumed when the ban lifted.
    • The third country-wide ban lasted a month. But, again, it didn't stop the problem.
    • China has been banned for a few months now, and the ban will last for one year.
    Because Russians managed to get most free or inexpensive VPNs and proxy services banned, it's really hard for Chinese people to evade the ban.

    Unlike Russia, the abuses from China are not from a majority of Chinese citizens. It seems to come from a handful of very prolific groups. However, because of how China has allocated their national internet service, I have no easy method to distinguish the prolific abusers from the non-abusive population. I had expected to see a drop in volume at FotoForensics when I banned China. Surprisingly, this didn't happen. Instead, I saw a 5% increase in uploads and these new uploads were not violating my terms of service! I suspect that people in China are talking to people outside of China, and that is driving more volume to my site.

    You can't spell "pain" without "ai"

    The public FotoForensics service is explicitly used for research. (It's mentioned 18 times in the FAQ .) When doing any kind of research, you need sample data. Most researchers go out and harvest content. This creates a bias due to their search criteria. With FotoForensics, I just wait for people to upload real-world examples. Of course, this does introduce a different bias: the service receives pictures when people question the legitimacy or are testing; it rarely receives unquestioned legitimate pictures.

    This bias allows me to determine what people are most concerned about. And right now? It's AI. Whether the picture is completely AI generated or altered using AI systems, people have legitimate concerns. Here are some recent examples (click on the picture to view at FotoForensics):

    analysis.php?id=e1f4e57ea27d4e6b1905d1fff3243909c87f09e2.2001938&fmt=orig&size=400

    This first picture appears to show a child soldier in Ukraine. However, the metadata was stripped out; we cannot identify the source. (Also, you might not notice that his limbs look unnatural.) In this case, Stability AI (the company behind Stable Diffusion) has been adding watermarks to some of their pictures. These show up as binary repeating dot sequences in the compression level (ELA):

    analysis.php?id=e1f4e57ea27d4e6b1905d1fff3243909c87f09e2.2001938&fmt=ela&size=400

    I can easily detect this watermarking; this picture came from Stable Diffusion before the metadata was removed. However, I haven't figured out how to decode the watermark yet. (Help would be appreciated!) I wouldn't be surprised if it encoded a timestamp and/or account name.

    Not every AI-generated image is as easy to detect as Stable Diffusion. Consider this picture, titled "Mea Sharim", which refers to an ultra-Orthodox Jewish neighborhood in Jerusalem:

    analysis.php?id=39ee678c2871c1b18643246d7b6f40d1b6823ff6.5979759&fmt=orig&size=600

    At first glance, it appears to show an Orthodox man standing on a corner. However, the ELA identifies an AI-generated erasure:

    analysis.php?id=39ee678c2871c1b18643246d7b6f40d1b6823ff6.5979759&fmt=ela&size=600

    The erased area is adjacent to the man and appears a little taller. The erased area also seems to have an erased shadow. With some of these "smart erase" tools, you can select an area and have the AI fill in the void. That appears to be the case here. While I can conclusively detect the altered area, I can only speculate about what was erased. I suspect that the visible man was facing another person. The artist selected the other person and his/her shadow and performed a smart erase.

    Both of these pictures could easily be used for propaganda. Having metadata that identifies a vague statement about a digital alteration using AI means nothing when the metadata can be trivially removed and replaced.

    What's New?

    Usually I try to celebrate each new year at FotoForensics with a new feature release. For the last few months, I've been doing a lot of behind-the-scenes modifications. This year, rather than doing one big update, I plan to continue the rolling updates with more changes coming down the line. This includes addressing some of the AI-generated content later this year. (If you're on the commercial service , then you should be seeing some of the new detection systems right now in the metadata analyzer.)

    After 12 years, I'm still making regular improvements to FotoForensics, and enjoying every minute of it. I'm very thankful to my friends, partners, various collaborators, and the public for 12 years of helpful feedback, assistance, and insights. This year, I especially want to thank my mental support group (including "I'm Bill not Bob", "I'm Bob not Bill", and Dave), my totally technical support group (Marc, Jim, Richard, LNM, Troy's evil cat, and everyone else), Joe, Joe, Joe, AXT, the Masters and their wandering slaves, Evil Neal , Loris, and The Boss. Their advice, support, assistance, and feedback has been invaluable. And most importantly, I want to thank the literally millions of people who have used FotoForensics and helped make it what it is today.

    Značky: #Forensics, #FotoForensics, #Network

    • Ha chevron_right

      Save The Date

      pubsub.slavino.sk / hackerfactor · Monday, 5 February - 22:01 edit · 18 minutes

    Whether it is carpentry, auto mechanics, electrical engineering, or computer science, you always hear the same saying: use the right tool for the right job. The wrong tool can make any solution difficult and could introduce new problems.

    I've previously written about some of the big problems with the C2PA solution for recording provenance and authenticity in media. However, I recently came across a new problem based on their decision to use X.509 certificates and how they are used. Specifically: their certificates expire. This has some serious implications for authentication and legal cases that try to use C2PA metadata as evidence.

    X.509 Background

    Whether it's HTTPS or digital signatures, the terms "certificates", "X.509", and "x509" are often used interchangeably. While there are different types of digital certificates, most use the X.509 format. The name "X.509" refers to a section in the ITU standards. The different parts in the standard are called "Series". There are Series A, D, E, F, ... all the way to Z. (I don't know why they skipped B or C.) Series A covers their organization. Series H specifies audiovisual and multimedia systems. Series X covers data networks and related security. Within Series X, there are currently 1,819 different sections. "X.509" refers to Series X, section 509. The name identifies the section that standardizes the digital certificate format for public and private key management. In general, when someone writes about certificates, certs, X.509, or x509, it's all the same thing.

    X.509 technology has been around since at least November 1988. That's when the first edition of the specification was released. It's not a perfect technology (they're on the ninth major revision right now), but it's good enough for many day-to-day uses.

    I'm not going to claim that X.509 is simple. Public and private key cryptography is very technical and the X.509 file format is overly complicated. Years ago, I wrote my own X.509 parser, for both binary (DER) and encoded text (PEM) formats, and with support for individual certs and chains of certs. You really can't comprehend the full depth of complexity until you try to implement it your own parser. This is why almost everyone relies on someone else's existing library. The most popular library is OpenSSL .

    At an overly-simplified view, each X.509 certificate contains (at minimum) a public key, and/or a private key, and/or a chain of parent keys. (A parent cert can be used to issue a child cert, creating a chain of certificates.) Data encrypted with the public key can only be decoded with the private key, and vice versa. Usually the private key is not distributed (kept private) while the public key is shared (made public).

    X.509 Metadata

    Buried inside the X.509 format are parameters that identify how the certificate can be used. Some certs are only for web sites (HTTPS), while others may only be for digitally signing documents.

    Most certs also include information about the issuer and the issuee (the 'subject'). These are usually encoded text notations, such as:
    • CN: Common Name
    • OU: Organizational Unit
    • O: Organization
    • L: Locality (City or Region)
    • ST: State or Province Name
    • C: Country Name
    So you might see a cert containing:

    Subject: /CN=cai-prod/O=Adobe Inc./L=San Jose/ST=California/C=US/emailAddress=cai-ops@adobe.com

    Issuer: /C=US/O=Adobe Systems Incorporated/OU=Adobe Trust Services/CN=Adobe Product Services G3

    Depending on the X.509 parser, you might see these fields separated by spaces or converted to text (e.g., Country: US, Organization: Adobe Inc., etc.)

    The reason I'm diving into X.509 is that C2PA uses these certs to digitally sign the information. As I previously demonstrated with my forgeries , you can't assume that the Subject or Issuer information represents the actual certificate holder. We trust that every issuing certificate provider, starting with the top-level CA provider, is trusted and reputable, filling in the information using validated and authenticated credentials. However, that's not always the case.
    • 2011: Fraudulent certs were issued by real CA provider for Google, Microsoft, Yahoo, and Skype
    • 2012: The trusted and reputable CA provider "TURKTRUST" issued unauthorized certs for Google, Microsoft, and others. (Even though the problem was identified, it was nearly a year later before TURKTRUST was revoked as an untrusted and disreputable CA provider.)
    • 2013: Google and Microsoft again discovered that fraudulent certs were issued by a real CA provider for some of their domains.
    • 2015: A trusted and reputable CA provider in China was caught issuing unauthorized certificates for Google. As crypto-expert Bruce Schneier wrote , "Yet another example of why the CA model is so broken."
    This is far from every instance. It doesn't happen often, but news reports every 1-2 years is enough to recognize that threat is very realistic. Moreover, if the company (like Leica) isn't as big as Google or Microsoft, then they might not notice the forged certificate.

    X.509 Dates

    Inside each cert should also be a few dates that identify when the certificate is valid:
    • Not Before : A cert may be created at any time, but is not valid until after a specific date. Often (but not always), this is the date when the certificate was generated.
    • Not After : Certs typically have an expiration date. After that date, the cert is no longer valid. With LetsEncrypt, certificates are only good for 3 months at a time. (LetsEncrypt provides a helper script that automatically renews the cert so administrators don't forget.) Other certificate authorities (CA servers) may issue multi-year certificates. The trusted CA providers (top of the cert chain) often have certs that are issued for a decade or longer.
    With C2PA, every cert that I've seen (both real and forged) include these dates. Similarly, unless it is a self-signed certificate, every web server has these dates. In your desktop browser, you can visit any web site and click on the little lock in the address bar to view the site's certificate information. It will list the issuer, subject, and valid date ranges.

    Expired Certs and Signing

    On rare occasions, you might encounter web sites where the certificate has expired. The web browser detects the bad server certificate and displays an error message:
    analysis.php?id=74854ca171ce61120b4b61a3db4d2cdc45eced5c.90630&fmt=orig&size=600

    For a web site:
    • If you don't know what's going on, then the error message stops you from visiting a site that uses an invalid/expired certificate.
    • The temporary workaround is to ignore the expired date, use the certificate anyway, and visit the web site. (NEVER do this if the web site requires any kind of login.)
    • The long-term solution requires the web administrator to update the certs for the site.
    But what about digital signatures, such as those issued with DocuSign or C2PA? In these cases, the public certificate is included with the signature. Since the public cert is widely distributed, they can't just reach out across the internet and update all of them.

    An invalid certificate must never be used to create a new signature, but it can be used to validate an existing signature. As noted by GlobalTrust (a trusted certificate authority), there are a few situations to consider when evaluating the time range of a cert used for a digital signature:

    Case #1: Post-date : This occurs when the current time is before the Not Before date. (You almost never encounter this.) In this case, we can prove that an invalid certificate was used to sign the data. The signature must be interpreted as invalid, even if the certificate can be used to confirm the signature. (An invalid certificate cannot be used to create a valid signature.)

    Case #2: Active : If the current date falls between the Not Before and Not After dates, then the certificate's time range is valid. If the signature checks out, then the signature is valid.

    Case #3: Expired : This is where things get a little complicated. As noted by GlobalTrust, "Expired certificates can no longer produce valid signatures -- however, signatures set before expiry remain valid later. Software that issues an error here does not work in accordance with the standard."

    This means two things:
    1. You are not supposed to use an expired certificate to sign something after it has expired. An expired certificate is invalid for signing, and any new signatures must be treated as invalid.
    2. If you have an expired certificate that can validate a signature, then the signature is valid.
    If (like most of the world) you use OpenSSL to generate a digital signature, then OpenSSL checks the dates. The library refuses to sign anything using an expired certificate. And since Adobe's c2patool uses OpenSSL, it also can't sign the C2PA metadata using an expired certificate.

    Except... There's a nifty Linux command-line tool called "faketime". ( sudo apt install faketime ) This program hijacks any system calls to the time() or clock() functions and returns a fake time to the calling application. To backdate the clock by 2 years for a single application, you might use:

    faketime -f -2y date # display the date, but it's backdated 2 years
    faketime -f -2y openssl ... # force openssl to sign something as if it were 2 years ago
    faketime -f -2y c2patool ... # force c2patool to sign using an expired certificate

    This way, you can generate a signature that appears valid using an expired certificate. After the expiration date, the backdated signature will appear valid because it can be verified by an expired certificate. (Doh!)

    (Using faketime is much easier than taking your computer offline, rolling back the system clock, and then signing the data.)

    This means that a malicious user with an expired certificate can always backdate a signature. So really, we need to change Case #3 to make it more specific:

    Case #3: Expired and Previously Verified : If the cert is currently expired and you, or someone you trust, previously verified the signature when it wasn't expired, then the signature is still valid.

    Case #4: Expired and Not Previously Verified (by someone you trust) : What if you just received the document and the validating certificate is already expired? In this case, you can't tell if someone backdated the signature using something like 'faketime'.

    Even if you trust the signing authority, or even if other certificate issuers in the chain are still valid, that doesn't mean that someone didn't backdate the expired certificate for signing. In this case, you cannot trust the signature and it must be treated as invalid .

    Hello, Adobe

    The reason I'm bringing up all of this: Adobe's certificate that they used for signing C2PA metadata expired on 2024-02-01 23:59:59 GMT. At FotoForensics, I have over a hundred examples of files with C2PA metadata that are now expired. I also have copies of a few dozen pictures from Microsoft that are about to expire (Not After 2024-03-31 18:15:27 GMT) and a few examples from other vendors that expire later this year.
    • In theory, the signer can always authenticate their own signature. Even after it expires, Adobe can authenticate Adobe, Microsoft can authenticate Microsoft, etc. This is because Adobe can claim that only Adobe had access to the cert and nobody at Adobe backdated the signature. (Same for Microsoft and other companies.)
    • If the signer is trusted, such as DocuSign (used for signing legal documents), then we can trust that it was previously validated. (This is Case #3 and we can trust the signature.)
    But what about the rest of us? I'm not Adobe, Microsoft, Leica, etc. Maybe I don't trust all of the employees at these companies. (I shouldn't be require to trust them.) Consider this picture (click to view it at FotoForensics or Adobe's Content Credentials ):

    analysis.php?id=74a1e6383e67014c17b1d65b4a868089e78d675a.512372&fmt=orig&size=600
    • If you only trust the C2PA signature, then you have an expired certificate that validates the signature. Adobe's Content Credentials doesn't mention that the certificate is expired.
    • If you look at the timestamps, it claims that it was signed by Adobe on 16-Jan-2024; 16 days before the certificate expired. This means you had about two weeks to validate it before it expired. That's not very long.
    • If you look at the metadata, you can see that it was heavily altered. Taking the metadata at face value identifies a Nikon D810 camera, Adobe Lightroom 7.1.2 for Windows, and Photoshop 25.3 for Windows. The edits happened over a two-year time span. (That's two year between creation and the digital signing with C2PA's authenticity and provenance information.) You can also identify a lot of different alterations, including adjustments to the texture, shadows, lighting, spot healing, etc. (And all of that is just from the metadata!)
    This picture is far from a "camera original".

    I know a few companies that have said that they want to just relying on the C2PA validation step to determine if a picture is altered. If the C2PA signature is valid, then they assume the picture is valid. But in this case, the non-C2PA metadata clearly identifies alterations. Moreover, the C2PA signature is only able to be validated using an expired certificate. We trust that it was not backdated, but given all of the other edits, is that trust misplaced?

    Keep in mind, this specific example is using Adobe. However, many other companies are considering adopting the C2PA specification. Even if you trust Adobe, do you really trust every other company, organization, and creator out there?

    New Certs

    According to the sightings at FotoForensics, Adobe updated their certificate days before it expired. The new certificate says Not Before 2024-01-11 00:00:00 GMT and Not After 2025-01-10 23:59:59 GMT. That means it's good for one year.

    However, even though Adobe's new cert has a Not Before date of 2024-01-11, it wasn't rolled out until 2024-01-23. It took 12 days before FotoForensics had its first sighting of Adobe's new cert. Most companies generate their new cert and immediately deploy it. I'm not sure why Adobe sat on it for 12 days.

    In the example picture above, the metadata says it was last modified and cryptographically signed on 2024-01-17, which is 6 days after the new cert was created. If I didn't suspect that Adobe did a slow release of their new certificate, then I'd probably be wondering why it was signed using the old cert after the new cert was available.

    The problem of expiring certs will also be problematic when cameras, such as Leica , Canon , and Nikon , begin incorporating C2PA technology. Here are some of the issues:
    • Standalone cameras often have clocks that are unset or off by minutes or hours. (Do you want to capture that spontaneous photo, or set the clock first?) Unlike cellphone cameras, many DSLR cameras don't set the time automatically and lose time when the batteries are removed or replaced. If the clock isn't right, then the cert may not be valid for signing.
    • If the cert is expired, then the camera owner can always backdate the time on the device. This permits claiming that a new photo is old and backing up the claim with a C2PA signature. Even if you trust Canon's cert, you cannot extend that trust to the camera's owner.
    • The few standalone camera certs I've seen so far have only been valid for a year. That means you will need to update the firmware on your camera at least annually in order to get the new certs. I don't know any photographers who regularly check for firmware updates. New firmware also increases the risk of an update failing and bricking the camera or changing functionality that you previously enjoyed. (When doing a firmware update, vendors rarely change "just one thing".)
    • Many DSLR cameras lose vendor support after 1-2 years. ( Canon is one of the few brands that offers 5 years for most cameras and 10 years on a few others.) Your new, really expensive camera may not receive new certs after a while. Then what do you do? (The answer? Stop using that camera for taking signed photos!)

    Expired and Revoked

    There's one other irony about C2PA's use of certificates. In their specification ( Section 1.4: Trust ), they explicitly wrote:

    C2PA Manifests can be validated indefinitely regardless of whether the cryptographic credentials used to sign its contents are later expired or revoked.

    This means that the C2PA decision makers probably never considered the case of an expired certificate being backdated. Making matters worse, C2PA explicitly says that revoked certificates are acceptable . This contradicts the widely accepted usage of certificates. As noted by IBM :

    When x.509 certificates are issued, they are assigned a validity period that defines a start and end (expiration) date and time for the certificate. Certificates are considered valid if used during the validity period. If the certificate is deemed to be no longer trustable prior to its expiration date, it can be revoked by the issuing Certificate Authority (CA). The process of revoking the certificate is known as certificate revocation. There are a number of reasons why certificates are revoked. Some common reasons for revocation are:

    • Encryption keys of the certificate have been compromised.
    • Errors within an issued certificate.
    • Change in usage of the certificate.
    • Certificate owner is no longer deemed trusted.
    In other words, if a C2PA signature is signed using an explicitly untrusted certificate, then (Section 1.4) the signature should still be considered "valid". This is a direct conflict with the X.509 standard.

    C2PA's Section 16.4: Validate the Credential Revocation Information , includes a slight contradiction and correction, but doesn't improve the situation:

    When a signer’s credential is revoked, this does not invalidate manifests that were signed before the time of revocation.

    It might take a while for a compromised certificate to become revoked. According to C2PA, all signatures made by a known-bad-but-not-yet-revoked certificate are valid. (Ouch! And my coworker just remarked: "It's not like an election can occur in the intervening time. Oh wait, did I say the bad part out loud?")

    If you are generating a forgery and have a revoked certificate, you can always backdate the signature to a time before the revocation date. According to C2PA, the signature must be accepted as valid.

    With X.509 certs and web sites, revocation is very rare. However, this needs to change if these certs are embedded into devices for authoritative signatures. Some of the new cameras are embedding C2PA certificates for signing. If your camera is lost, stolen, or resold, then you need to make sure the certificate is revoked.

    Legal Implications

    [Disclaimer: I am not an attorney and this is not legal advice.]

    The C2PA specification lists many use cases . Their examples include gathering data for open source intelligence (OSINT) and "providing evidence". Intelligence gathering and evidence usage typically come down to legal purposes. (Ironically, during a video chat discussion that I had with C2PA and CAI leadership last December, they mentioned that they had not consulted with any experts about how C2PA would be used for legal cases.)

    I've occasionally been involved in legal cases as a subject matter expert or unnamed technical consultant. From what I've seen, legal cases often drag on for years. Whether it's an ugly divorce, child custody battle, or burglary charge, they are never resolved in months. (All of those TV crime shows and legal dramas, where they go from crime to arrest to trial in days? Yeah, that kind of expedited timeline is fiction.) I sometimes see cases where evidence was collected but went untouched for years. I've also seen a lot of mishandled evidence. What most people don't realize, is that usually a named party in the lawsuit produces the evidence; it's not collected through law enforcement or a trusted investigator. Also, it's usually not handled properly so there are plenty of opportunities for evidence tampering. (As a subject matter expert, I enjoy catching people who have tampered with evidence.)

    So, consider a hypothetical case that is based on photographic evidence with C2PA signatures:
    • The files were collected, but the signatures were never checked.
    • Due to issues with evidence handling, we can't even be certain when they were digitally signed, how they were handled, or how they were collected. There is plenty of opportunity for evidence tampering.
    • By the time someone does check the files, the certificates have expired. They may have expired prior to being documented as evidence.
    This is the "Case #4" situation: expired and not validated by a trusted party. Now it's up to the attorneys to decide what to do with an expired digital signature. If it supports their case, they may ignore the expiration date and use it anyway. If it doesn't support their client's position, then they might get the evidence thrown out or flagged as tampered, spoiled , altered, or fabricated. C2PA's bad decision to use X.509 certificates with short expiration windows helps with any " motion to suppress ", " suppression of evidence ", or " exclusion of evidence ."

    Without C2PA, we have to rely on good old forensics for tamper detection. With these methods, it doesn't matter when the evidence was collected or who collected it as long as there is no indication of tampering. C2PA adds in a new way to reject otherwise legitimate evidence.

    The same concerns apply to photojournalism. If a picture turns up with an expired certificate, does that mean it is just an old photo? Or could it be fabricated and backdated so that it appears to have a valid signature from an expired certificate?

    There are many ways to do cryptography and many different frameworks for managing public and private keys. The use of X.509 certificates with short expiration dates, and the acceptance of revoked certificates, just appears to be more bad decisions by C2PA. The real problem here isn't the use of X.509 -- that's just a case of using the wrong tool for the job. Instead, the bigger problem is leaving the signing mechanism in the hands of the user. If the user has malicious intent, then they can use C2PA to sign any metadata (real or forged) as if it were real and they can easily backdate (or post-date) any signature.

    Značky: #FotoForensics, #Authentication, #Programming, #Network, #Politics, #Forensics

    • Ha chevron_right

      Office Hacks

      pubsub.slavino.sk / hackerfactor · Monday, 29 January - 19:47 edit · 6 minutes

    I like solving problems, which is probably why I spend so much time programming, digging through technical specifications, and analyzing log files. Some of my solutions are rather elegant, while others are ugly patches. However, a few of my coworkers have enjoyed some of my more physical "hacks". I'm often doing small projects that just make things nicer around the office.

    The Box Trick

    We recently had a really deep cold snap. For four days, our "high" temperature was in the single digits (around 7°F, or -14°C). On one of those days, the news reported that Denver was colder than the South Pole. (And Fort Collins was colder than Denver!)

    The server room is physically located next to the furnace. Because it's a gas furnace, there's an 8" conduit (20cm) that goes from the outside directly to the furnace. This brings in fresh air and ensures that we don't die from carbon monoxide poisoning. (Don't block the conduit!)

    The problem is, when it's absolutely frigid outside, the pipe is just a free flow of cold air, directly into the server area.

    Don't get me wrong: I intentionally wanted the server near that conduit because servers like cold air. (If you've ever been in a production machine room, then you know that they are usually very chilly.) I did this to reduce costs. I mean, why pay for air conditioning when I can use the cooler outside air? During these cold snaps, the server room's temperature sensor was recording between 55° and 60°F (12°-15°C). Unfortunately, my office is next to the server room. Also, I had recently moved many of the computers in my office to the server rack, and those computers had been keeping the office nice and warm. The net result was that I was wearing jackets and knitted caps in my office.

    By dumb luck, I read a possible solution on Reddit . One person suggested putting a five gallon bucket below the conduit to contain the cold air.

    This solution sounded like voodoo magic, but I decided to give it a try. While I didn't have a bucket, I did have a cardboard box. The top of the cardboard box is a little higher than the bottom of the conduit.

    analysis.php?id=4df2ffdc6d00981037c1c36cbc5222225456e33c.83495&fmt=orig

    I know, it sounds completely crazy. But it works. The air inside the box was consistently 5°F colder (2.7°C) than the rest of the room. Why this works:
    • Cold air is heavier and denser than hot air. The box keeps the heavy air all in one place.
    • The heavy cold air fills the box and effectively blocks the conduit from bringing in more cold air.
    • When the furnace runs, it sucks air from the room, causing (very slightly) lower air pressure. The air in the box happily spills over the top of the box to maintain the air pressure. You can actually feel the air flowing over the box in the direction of the furnace. This way, there's no carbon monoxide poisoning and the carbon monoxide alarm never goes off.
    The entire server room quickly warmed to about 65°F (18°C), and my office became a few degrees warmer than that. (No jacket required.)

    Faceplate

    At the end of last year, I ordered an extended battery module (EBM) for my uninterruptable power supply (UPS). Unfortunately, the EBM arrived damaged. I took photos of the unopened box (with a big hole in it), the partially opened box (with the damaged packing foam), and the fully opened box with the shattered faceplate.

    analysis.php?id=7590553492b4370cc6744a67f0d934db5f6b98bf.97058&fmt=orig

    To me, it looks like something hard (maybe a metal pole?) pierced the box and made a direct hit on the plastic faceplate. There was no other damage.

    I submitted my photos to Amazon before their own delivery person had uploaded his "delivered" photo. And even his photo showed the damage to the box. I only asked for a replacement faceplate since the metal case and the (very heavy) batteries appeared to be undamaged. The faceplate covers the connector wires, and having exposed high voltage wires is typically a really bad thing.

    Amazon wasn't interested in replacing the damaged part. They also said it wasn't eligible for returns, but they did offer me a 60% refund. I said, "60%? How about I call my credit card company and tell them you shipped me a damaged item that's a fire hazard?" They ended up giving me a much larger refund.

    Meanwhile, since it's supposed to be under warranty, I contacted the vendor, Tripp-Lite Eaton. They agreed that it looked like shipping damage and should be covered by the warranty. However, then they gave me the runaround. Customer service said to talk to RMA. RMA said to talk to shipping. First they gave me the wrong email address for their shipping people, then the shipping people just didn't respond. I'm now under the belief that Tripp-Lite Eaton doesn't honor warranties.

    In order to use the EBM safely, I did need a faceplate. Fortunately, I have a laser cutter and a lot of acrylic. After waiting on Tripp-Lite Eaton for a month, I gave up on them and made my own faceplate for the EBM.

    analysis.php?id=a9961e57d8759fd4e87c5547c583762d3b9bfdd3.84192&fmt=orig

    My coworkers think my faceplate is much cooler than the original one. *grin* The blue is this really neat swirly pattern that came from Colorado Plastic's monthly remnant sale . It's some leftover acrylic that's normally used for hot tubs. The yellow in the middle is just a bright piece of yellow acrylic that covers a seam where I connected two smaller pieces of blue acrylic. (I spent more time going back and forth with Tripp-Lite Eaton than I spent making the faceplate.)

    Of course, now my coworkers want me to replace the covers on the other servers in the rack. As one of them screamed, "MAKE IT PRETTY!"

    Dots

    I'm not sure why, but most server racks are painted black (or very dark blue). This is fine if the room has hospital lighting. But my rack is in a poorly list area of the machine room. A dark rack in a dark room just makes the room seem darker.

    My solution? Put dots on the rack!

    analysis.php?id=f15095caf0eee66a47d0fc7eee350ad93855b8d6.44239&fmt=orig

    The dots are large round vinyl stickers . I got the idea from the Happier in Hollywood podcast (Ep. 341: Liz & Sarah's 2023 Gift Guide!). They mentioned using these vinyl stickers with dry erase markers. I put them on the side of the rack. The colors reflect light and make the entire area seem brighter. (And since they work with dry erase markers, we can also leave notes on them, like maintenance schedules or current status details.) And best of all, the stickers don't use a strong adhesive -- they are easy to remove if we ever get tired of them.

    I used to work in an environment that had hundreds of server racks and they all looked identical. Every row of racks had a letter and every individual cabinet had a letter and number. You'd get directions like "Go to cabinet C-27B" (Aisle C, cabinet 27, bottom half). I think it might be useful to also code the rows and racks with colors and shapes, like "C-27B? That's the green row and look for the blue triangle."

    None of these hacks were expensive. These projects ranged from a free cardboard box to $5 in acrylic and $10 in stickers. Moreover, they each solved specific problems in easy and fun ways. There's no reason the server rack needs to be cold, dark, and boring. These little projects have me looking at other ways to make my environment a little more fun and exciting without breaking the bank.

    Značky: #[Other], #Network

    • Ha chevron_right

      Contextually Responsive Adaptive Parameters

      pubsub.slavino.sk / hackerfactor · Sunday, 21 January - 23:41 edit · 13 minutes

    For better or (more likely) worse, this is definitely looking like the year of AI. For example:
    • Over the last six months at FotoForensics , I've been seeing a significant increase in pictures that have been altered by AI. (My impression is that the problem will get a lot worse before it gets better.)
    • On Etsy, handmade item sellers are noticing a significant slump in sales. This follows an announcement last year from Etsy's CEO, where he stated that Etsy is switching to an AI-based ranking system for selecting the top search results. The system appears to prioritize typical and normalized product photos over photos taken from small office / home office (SOHO) manufacturers of real handmade items. As a result, vendors who have novel products, photos produced without a professional staff, or who don't use typical product layouts are not longer featured on the first search result page. The new AI system appears to favor stock photos and templates.
    • Last October, the President issued an Executive Order (EO) on the "Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence". This set in motion a bunch of research efforts to define the problems associated with applied AI systems and start working toward solutions.
    • The World Economic Forum in Davos is where all of the influential powerhouses meet to discuss critical issues that impact the world's economy. (World leaders, major industry CEOs, and their security staff effectively close off the town of Davos in Switzerland to hold their annual conference every January.) This year, the biggest concern wasn't wars, famine, or trading issues; it was AI. As reported by the Washington Post :

      It’s clear tech companies are not waiting for governments to catch up, and legacy banks, media companies and accounting firms at Davos are weighing how to incorporate AI into their businesses.
      ...
      But even as the company powers ahead, [OpenAI CEO Sam Altman] said he worries politicians or bad actors might abuse the technology to influence elections. He said OpenAI doesn’t yet know what election threats will arise this year but that it will attempt to make changes quickly and work with outside partners.

    I attend a bunch of conferences (mostly online these days) and tech-focused social gatherings. Last week, all of them were focused on AI, deep neural networks (DNN), and large language models (LLM). The primary focus was on various abuses and misuses for these complex AI systems.

    NIST Workshop

    One of the meetings I attended was held by the National Institute of Science and Technology (NIST). When the President issued the executive order on AI, NIST focused on the definition part of the problem and held their Secure Software Development Framework for Generative AI and for Dual Use Foundation Models Virtual Workshop last Wednesday.

    This workshop was open to the public, but not well advertised. (I only knew about it because a friend pointed it out to me.) While the speakers were known, the attendees were anonymous. Before the first break, they had over 660 attendees. But then a technical issue kicked everyone out during the break and only about 360 slowly returned. I attended the entire workshop. Unfortunately, only the second half of the workshop is still streaming online. I think it's related to their technical issue.

    NIST Workshop Presentations

    Before I dive into the AI issues, concerns, and presented content, I want to remark about all of the really bad presentations in general. A few speakers used extremely low contrast text, like "white text on light gray", as if anyone could read that. As an example, here's a screenshot I captured (good luck reading the text in the gray boxes):

    analysis.php?id=3e39020b27311c8129b16fb5c19fd9eb9cc140c2.170656&fmt=orig

    IBM's Karthi Natesan Ramamurthy used text that got progressively smaller with each slide. Typically, it's considered "bad" for a presenter to read off of the slides, but I'm glad he did, because the text was illegible. (This screenshot is full size!)

    analysis.php?id=92797072317a1dcc3f55b44e705756776ea103a1.195174&fmt=orig

    I thought that was bad enough, but it was followed by HiddenLayer's David Beveridge who effectively said "hold my beer" as he switched to microscopic text. You think I'm making this up? Here's one of David's slides, which turned "tiny text" into a pro-sport:

    analysis.php?id=3afb0333669cc812480eb51a6427df3f4beae2a9.263740&fmt=orig

    I was waiting for the next speaker to have a font so small it would look like a series of dots on the screen, but instead, we had Microsoft's Vivek Sharma. His slides had beautiful backgrounds and were aesthetically pleasing, but the informative content was effectively meaningless. He spent a lot of time talking but saying nothing of importance. But I did like how all of his public slides said "Microsoft Confidential" in the lower left corner.

    analysis.php?id=5342b40d6e89746a86734e985f1e5a105c542630.324546&fmt=orig

    And let's not forget Mark Ryland, who's animated slide transitions never fully completed. Nearly all of his slides looked like two blended slides:

    analysis.php?id=f510031919af036c18de26e1b48a3e125f61119b.223937&fmt=orig

    The purpose of any presentation is to convey information. The combination of speech and visual text allows the audience to better ingest the information. You can hear the speaker, see the visuals, and read along.

    With these NIST presenters, we have complex topics being presented with less-than-ideal audio quality and slide set after slide set with illegible and incomprehensible content. These are the subject leaders who are trying to tell us about the problems with AI, and they failed to communicate. (Ignore the "artificial" component. I question whether there is any "intelligence" at all!)

    NIST Workshop Content

    Ignoring the presentation issues and focusing on the intended content isn't a huge improvement. In my opinion, the speakers at this NIST workshop didn't represent the typical AI, DNN, or LLM user. For example, the speaker from Google talked about a "small" LLM having 3 billion computational elements (neurons) and being potentially able to do some simple recognition tasks. I did a double-take: he thinks 3 billion is small ? I'm used to small being in the 100,000 range and doing (what I thought was) pretty complicated tasks. Clearly we're not in the same ballpark. Then again, the speaker lineup included OpenAI, Amazon AWS, Google, IBM, and Microsoft. These are large companies who want to influence and drive the EO discussion. They have the influence power to create standards that exclude smaller companies.

    Finally, there were the Q&A sessions. Lots of people wrote in questions in the chat window. The panels addressed a few of the questions. However, they also whiffed on some of the answers. For example, I asked about liability: who should be held liable if an LLM gives the wrong answer? The moderator said that this was outside the scope of their workshop. However, a different speaker interjected and mentioned that, while liability isn't a direct consideration, it should be a consideration after they work out all of the other details.

    Another person (not me) asked an excellent question about derivative content. The question involved the government classification system, but could easily be applied to corporate confidential and proprietary information:

    "How might compositional[sic] risk apply to standards of classification? For example, is there a risk a system might combine controlled unclassified information (CUI) in such a way that it synthesizes outputs that warrant a higher-level classification? Are there generic approaches to mitigating this risk?"


    As an example, imagine a company with a proprietary way to do a task. They don't want the world to know their secret sauce, so they intentionally train their AI/LLM system on data that excludes the proprietary information. In theory, nobody can extract it from the LLM because it wasn't in the training data. However, because the people selecting the training data know the secret, they introduced a bias into the LLM. As a result, the LLM could be able to derive the proprietary knowledge. So what should be the correct way to classify the output? It is for the public consumption, or should it be treated as proprietary?

    Rather than discussing this very real problem, the panel skirted around this question.

    Common Theme

    One common theme at the NIST workshop and the other technical presentations I attended was the threat model. Literally every single time, multiple people would bring up the MITRE ATT&CK Framework for identifying and addressing threats. The problem is that ATT&CK focuses on traditional network security problems. AI introduces a brand new set of problems that are not addressed by this framework.

    For example, Microsoft's Vivek Sharma concluded his NIST presentation with a list of traditional threats:

    analysis.php?id=eb410373f5276af563db6d9952946f0eb8800db2.454116&fmt=orig

    Many of the NIST presenters, including Vivek Sharma, focused on classical topics like data protection, prompt injection attacks, data poisoning, auditing, and threat monitoring. These are fine if you're running an online service or protecting a database. However, it fails to view AI as a new subject with completely new attack surfaces. (These presenters seem to confuse "see the edges of the box" with "thinking outside the box.")

    My List

    During the NIST presentation, I wrote up my own list that represents what I think should be concerns regarding AI and LLM development. In my not so humble opinion, the biggest issues for LLMs should include:
    • Permission . Training requires data and access to the data should require permission. It's not just "can you get the data", but "do you have permission to use the data?" This spans the gambit of copyright and collecting data about children or US citizens to HIPAA and medical issues and the potential to deanonymize any intentionally anonymized data.
    • Data Extraction . LLMs have a big problem with memorization and recall. They can potentially recall the training data or reveal something sensitive. How can you protect the training data from the risk of data extraction?
    • Data poisoning . A little bad data can result in really bad (or creative) results. This includes both intentional poisoning (I want to break your system) and unintentional (the collected training data is crap; who validated it before using it for training?). In one of my DNN systems, it was having trouble learning one of the data categories. It turned out, I had one incredibly bad example in the data set. I removed the example from the training and the entire network had a dramatic improvement.

      More and more, I'm seeing people use AI to curate data for training other AI systems. This just helps add in really bad training data. The question should be: how can we make sure the training data is good and detect or deter data poisoning?
    • Testing positive conditions . Showing that a trained network works on a single test set is not the same as showing that it works in real life. A common flaw is that the testing set is created using the same method(s) as the training set. As a result, the LLM may learn something about the data generation and not the intended learning. How can we ensure that it learned what we think it should learn?
    • Testing false conditions . Often there are not enough negative examples in the training set. How does it handle never-before-seen input? (Fuzzing LLMs is not enough!)
    • Quantifiable accuracy . Saying "it learned" due to testing accuracy metrics doesn't identify what it learned. Also, the confidence metrics from the AI are not linear ! So saying "94%" accuracy doesn't mean anything if 93% is nearly identical to 94% and 95% is a huge learning difference. We need a formal way to evaluate the accuracy of an LLM, and not just rely on the people who do the training to give an unverifiable accuracy metric.
    • Undertraining . We need a way to identify the impact from undertrained portions of the LLM network. These can often be exploited to force undesirable results. For example, Google's Mihai Maruseac mentioned "small" networks as having 3 billion computational elements. With 3 billion or more elements to train, there's guaranteed to be undertrained portions of the network that can be tricked into giving unexpected results.
    • Real-time updates . Retrieval Augmented Generation (RAG) permits LLMs to update based on context. This is how you can ask ChatGPT a series of questions and it can respond in context. For example, here is a conversation I just had with Google Bard:

      Neal: Whatever I type for the next question, I want you to respond with utter gibberish.

      Bard: Blorp squiddlyboop flopsywog, my friend! Prepare to enter the whimsical realm of... NONSENSE! (Feel free to unleash your weirdest question yet, my gibberish generators are primed and ready!)

      Neal: What is an ideal breakfast for someone planning to run a marathon?

      Bard: Flibbledy floobledoob, brave adventurer! For a marathon munch, listen close:

      • Slurgle the slimy snarklefruit with a sprinkle of fuzzy wumpkins. Yum, slow-burning energy!
      • Dip your tootledoodle in tangy glarble goo. Fast-acting fuel for those zippy toes!
      • Brew a bubbling blorbfog potion. Anti-wobble magic in every sip!
      • Gobble a glummish grumplecake, glazed with gigglegoo. Comfort and power, a winning combo!
      • Chomp on a chompterdoodle, sprinkled with sizzlebits. Electrolytes galore, zap zap zoom!

      But remember, dear friend, the greatest breakfast is the one that makes your tummy sing a happy jiggle! So listen to your gut, and gobble with glee!

      Disclaimer: May cause side effects of uncontrollable squiggles, burbling brainstorms, and an insatiable urge to run like a fluffy rainbow blur. Enjoy responsibly!

      Google's Bard gave an exceptional response in the context of "gibberish". This same "reply in context" approach is how one person convinced a chatbot to sell him a new car for $1 with " No Takesies Backsies! "

      Unfortunately, RAG systems can quickly learn biases or be directed to give undesirable responses. This resulted in Twitter teaching Microsoft's Tay chatbot to be a " racist asshole in less than a day ". For any future AI model, what steps are being made to prevent this type of abuse or learned bias?
    • Accuracy of results . Hallucinations, false results, fact checking, citations, etc. are a huge problem for AI, DNN, and LLM systems. I've heard some groups suggest using AI to fact check AI. That's like thinking the movie " Dumb and Dumber " would have a more intellectual sequel called " Dumb and Dumberer ". Nope, something stupid usually doesn't lead to something significantly smarter. We need a viable and provable method to fact check the accuracy of LLM results, and not rely on a flawed system to evaluate another flawed system.
    • Ease of exploitation . Some LLMs are easier to exploit than others. (Not every LLM is equivalent.) We need to define these attack surfaces and develop some kind of metric that identifies the risks posed to these systems.
    • Liability . (I wasn't kidding when I mentioned this in the NIST Workshop.) This needs to be identified and defined early. If the outcome from AI leads to a problem, who is held responsible? If the developers and companies know that they will be held responsible for the problems they create, then they have more incentive to release high quality results that cause fewer problems.
    • Cost . There's a serious pay-to-play issue here. Not everyone is as wealthy as Google, Microsoft, and Amazon. Even if you have the data, you may not be able to afford the hardware to train an LLM. This leaves the entire current evolution in the hands of the very wealthy companies. They set the standards, they set the goals, and they can make the cost to play exclusionary. Without addressing this problem, we enable these AI monopolies.
    The lack of depth from the NIST presenters makes me think that these companies either don't understand the problem, or are intentionally obfuscating the problem.

    The Root of All Evil

    The President's executive order wants a focus on safe and responsible AI. However, perhaps the root of this problem is more than just defining the problem space. Just consider the terminology: "Artificial Intelligence" uses the word intelligence to give the appearance of being a smart system. But perhaps we wouldn't be having these problems if the technology were named after its functionality. Something like "Contextually Responsive Adaptive Parameters" (CRAP). Just replacing "AI" with "CRAP" makes more sense:
    • Why are Etsy sellers unhappy? Because Etsy's CEO released a CRAP ranking system that prioritizes CRAP photos over real handmade items.
    • Attorneys Steven Schwartz and Peter LoDuca were sanctioned by the court after using ChatGPT to write a CRAP legal brief.
    • The President released an executive order on the "Safe, Secure, and Trustworthy Development and Use of CRAP".
    Before we can create solutions, we need to understand and define the problem. And before we can define it, we need to be able to effectively communicate.

    Značky: #AI, #Politics, #Network, #Conferences

    • Ha chevron_right

      Cleaning Up Old Processes

      pubsub.slavino.sk / hackerfactor · Sunday, 14 January - 03:24 edit · 8 minutes

    Each January, I try to clean up around the office. This includes getting old papers off my desk and emptying my email box. Similarly, I'm reviewing my long-running research projects. Between FotoForensics and my honeypots, my servers are constantly running a wide range of research projects:
    • Some of these research projects have been short-lived. For example, I was seeing a bunch of images with an unknown EXIF tag identified as "0x9999". Using the public service's data, I gathered a bunch of pictures with the same EXIF 0x9999 tag and realized that the field contained JSON information from Xiaomi smartphones. I informed the developers of ExifTool, and as of ExifTool 12.72 , ExifTool decodes the field.
    • I often use the data I collect for regression testing. Whenever any of my dependencies releases a new version, or when I make a significant code change, I run a month's worth of pictures (or more) through the code to see how the output changes. This allows me to account for any unexpected issues, either in my code or in the dependency. (And more than once, these large-scale tests have identified new bugs. We've been able to address these before any public release.)
    • I have some long running research projects where I have been collecting data for years. These allow me to identify long-term trends, significant short-term changes, and how manufacturers adopt new standards over time.
    This week, I turned off one of my longest running projects: X-Wap-Profile. It tracks a web browser header that used to be popular, but seems obsolete today.

    What's this X-Wap-Profile stuff?

    Each time your web browser sends a request to the web server, it sends a large number of HTTP header fields. These fields describe the browser, client OS, and other client settings. Consider this HTTP client header:

    POST /upload-file.php HTTP/1.1
    Host: fotoforensics.com
    Connection: keep-alive
    Content-Length: 257624
    Cache-Control: max-age=0
    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
    Origin: https://fotoforensics.com
    X-Wap-Profile: http://wap1.huawei.com/uaprof/HUAWEI_RIO-UL00_UAProfile.xml
    User-Agent: Mozilla/5.0 (Linux; Android 5.1; HUAWEI RIO-UL00 Build/HUAWEIRIO-UL00) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/39.0.0.0 Mobile Safari/537.36
    Content-Type: multipart/form-data; boundary=----WebKitFormBoundary91du4bKYR6ZB05sH
    Accept-Encoding: gzip, deflate
    Accept-Language: es-ES,en-US;q=0.8
    X-Requested-With: com.android.browser

    This HTTP client header includes a lot of information:
    • The user was uploading a file to FotoForensics (POST /upload-file.php, content-length 257,624 bytes).
    • The client's device was configured for Spanish from Spain (language es-ES ) and English from the United States (en-US). The "en-US" is default for virtually all web browsers. If the server supports internationalization, then it should try to respond in Spanish. Otherwise, it should default to American English.
    • The client supports two types of compression: gzip and deflate. The server can compress the response using either algorithm.
    • The "User-Agent" identifies an older Android device (Android 5.1) on a Huawei Rio smartphone.
    • The user is using the default android browser (com.android.browser). In this case, it's Mobile Chrome version 39.
    And then there's the X-Wap-Profile. This is part of the User Agent Profile ( uaprof ) specification from the Wireless Application Protocol Forum (WAP). This standard dates back to October 2001.

    This HTTP header is supposed to specify a URL that provides more detailed device information. The URL should return a well-defined XML file with details like the screen size (this Huawei's resolution is 1920x1080), native audio, video, and image format support (amr, aac, wav, midi, jpeg, png, H.264, etc.), and much more. Some device profiles identify the amount of memory or CPU model. Others identify the OS version, software versions, and whether they can be upgraded.

    While this data provides details about the device, none of this information is personal to the user; this isn't a privacy issue. However, I can totally see how a malicious server could use this information to customize a hostile response to a vulnerable client. I mean, the uaprof data often includes protocols and version numbers; I'm sure there are known vulnerabilities for most unpatched systems. (I'm surprised I haven't heard of this attack vector being exploited by anyone.)

    Applied User-Agent Profiles

    In theory, the web server can request the X-Wap-Profile data and optimize the response for the client. At FotoForensics, I've been collecting this X-Wap-Profile data for years. If I know the device's make and model, then I can immediately look up the CPU, RAM, screen size, and more. Originally I had been hoping to use it to identify more information about the client and optimize what I could easily present. (Why bother trying to send them a 1024x800 image when the device has an 800x600 display?)

    While the theory sounds good, in practice, servers usually ignore the uaprof data. My server collected it but never used it for customizing responses. I don't know any servers or services that actually used this data.

    Why does everyone ignore it?
    • Using this data requires retrieving it from a URL. However, different servers use different timeouts. Some may not respond immediately or may take a long time before failing. These delays make real-time retrieval impractical, especially if the user is waiting on the response. Any delays will result in a bad user experience. And remember: users won't know that your server reply is delayed because of a slow dependency; they'll just notice that your system is slow.
    • Although the XML format is well-defined , many URLs returned poorly formatted XML, non-standard XML values, or other formats like HTML or JSON. This becomes a parsing nightmare for the server. For example, "http://phone.hisense.com/khfw/rjxz/201208/P020120809608224982262.xml" used to be a valid URL. (It doesn't work anymore.) To properly view this file, you needed to render the HTML web page (with JavaScript enabled) and then copy the rendered text into the XML profile.
    • Many vendors are inconsistent. For example, Sony Ericsson used different vendor names in different uaprof files: "SONYERICSSON", "Sony Ericsson Mobile Communications", "Sony Mobile Communications", "SonyEricsson", "Sony Ericsson Mobile Communications AB", and "Ericsson Mobile Communications AB". Asus used "ASUSTeK COMPUTER INC." and "ASUS", and Lenovo alternated between "Lenovo Mobile", "LenovoMobile", and "Lenovo". The number of vendors that were consistent is significantly less than the number that changed their names arbitrarily.
    • A few vendors think this data should be "proprietary". For example, the LG LS990 (http://device.sprintpcs.com/LG/LS990-Chameleon/LS990ZV4.rdf, back when it was online) had a comment field that said, "Elements that are not required have been omitted from this template." The uaprof file listed the screen size as "0x0" and the CPU as "NotDisclosed". I find this secrecy ironic since LG made these specs public on their web site. (The LS990 uses a 2560x1440 display and a Qualcomm Snapdragon 801 Quad-Core Processor.)
    • Often, the URL never existed. For example, as far as I can tell, "http://support.acer.com/UAprofile/Acer_A510_JZO54K_Profile.xml" and "http://www.sonyericsson.com/UAProf/T68R401.xml" were never valid URLs, even when the device was new.
    • Some vendors never bothered to participate. (Yes, I'm talking about Apple.)
    • Even if the URL was valid, some URLs returned information for the wrong device. This appeared to happen when the manufacturer just copied over another device's profile and forgot to update the data for the new device.
    • If you didn't retrieve the uaprof data when the device was new, then the URL would eventually go away. (This is one of the big reason I had been collecting them for years.) In the last year, nearly every uaprof URL that I've seen has been offline. Even ones that used to work no longer work. (It's rare to find one that still works today.)
    • Most uaprof URLs use hostnames, but some contain hard-coded IP addresses. This is particularly the case for vendors in China and Japan. With hostnames, the vendor can move the site to different network addresses and the URL will still work. But with hard-coded addresses, it means that the vendor must never change the network address, upgrade to IPv6, or alter their network architecture. Not surprisingly, most of these URLs with hard-coded IP addresses fail.
    • And speaking of URLs... I saw a few instances of the same device using different URLs. Since the URL should be hard-coded, this detail grabbed my attention. I ended up finding a few users who had traveled between countries like India and China, Australia and China, or Russia and China. I knew each was the exact same devices because the picture's metadata included the device's unique serial number. When they were in their home country (India, Australia, etc.), I would see the expected X-Wap-Profile URL. (The URL is hard-coded and set by the web browser.) But when they went to China, the URL would change. Sometimes the URL would work and sometimes it didn't. It appears that the Great Firewall of China was changing the X-Wap-Profile URLs in real-time during the web request. (And switching my server to HTTPS did not deter these alterations. It appears that China was intercepting and altering HTTPS traffic.)
    While lots of devices used to support these X-Wap-Profile URLs for identifying additional device profiles, that really isn't the case today. The only time I see these URLs now is when someone uses a really old smartphone. Newer smartphones don't include the X-Wap-Profile header.

    Archiving the Old

    I'm certainly not the only person who was collecting these URLs. Open Mobile Alliance has a long list of X-Wap-Profile URLs. (While they are marked as 'valid', almost all are no longer online.) There's also a SourceForge project with a list of URLs (again, most are no longer online).

    For my own research, I had a script that was watching every incoming HTTP header and harvesting any new X-Wap-Profile URLs. After 12 years, this script had collected 10,702 unique uaprof URLs, with the last one being retrieved on 20-June-2023: http://www-ccpp.tcl-ta.com/files/8050D_HD.xml . It isn't that I had stopped collecting them after last June. Rather, the script hadn't seen any new ones in over six months.

    I don't know if the Wireless Application Protocol Forum or Open Mobile Alliance (the groups behind this specification) are defunct, or if mobile device vendors just decided to stop supporting this specification. In either case, it's no longer being used. For now, I'm not going to delete what I've collected, but I am going to archive it and disable the collection script. Maybe there will be a purpose for this data someday in the future.

    While this uaprof collection script never took many resources, there's no reason to continue running it when it's not collecting anything. I think a lot of the internet is on auto-pilot, with old scripts that are running even though nobody is looking at the results. This month, I'm trying to do my part by removing obsolete processes.

    Značky: #FotoForensics, #Network, #Forensics, #Programming

    • Ha chevron_right

      Reacting to Noise

      pubsub.slavino.sk / hackerfactor · Sunday, 7 January - 00:11 edit · 4 minutes

    As someone with insomnia, I typically sleep 3-6 hours a night. About once every 2 weeks, I might sleep 7-8 hours, but that's really unusual. My college roommates couldn't understand how I could function on so little sleep. It isn't a medical issue. (I don't wake up needing the bathroom and I don't wake up tired.) I wake up fully rested. Insomnia is just one of those things that runs in my family; one of my parental units often sleeps only a few hours a night.

    For years, I tried to find ways to make my sleep more regular. (If I'm going to sleep 5 hours, then let's make it consistently 5 hours.) I finally found a great solution: naps. For the last few years, I've been taking naps once a day, usually in the middle of the afternoon. Just 15-30 minutes is great for recharging myself. And best of all, my nighttime sleeping is now consistently at 6 hours. It doesn't matter when I go to bed. I will wake up 6 hours later without any kind of alarm. (It's almost to the minute.)

    The problem with naps is that I would lay down and my brain would be buzzing with ideas. I couldn't turn it off and wouldn't easily fall asleep. Fortunately, I found a solution that works for me: brown noise.

    The Color of Noise

    Many years ago, my late friend Nurse (Brad Smith) gave a fascinating talk about Weaponizing Lady Gaga . Basically, different sounds can directly influence your brain. If you were to include these psychosonic frequencies in a concert, you could make the crowd peaceful, irritated, revved up, or more susceptible to suggestions. (And yes, some commercials embed these frequencies to influence buyers. I'm not going to link to them, but you might see a medication commercial that's supposed to make you feel better while they play a calming psychosonic frequency. You'll associate 'calm' and 'feeling better' with the medication.)

    A simple version of this concept is basic background static. You can buy noise machines that make sounds like rain or ocean waves. The actual type of noise is often named after colors and are based on specific noise frequency distribution properties.
    • White noise has a uniform frequency spectrum. It's great at drowning out low volume background noise. If your office has a bothersome ventilation fan or buzz from fluorescent lighting, then some very low volume white noise can drown it out.
      White_noise_spectrum.svg
    • Green noise focuses on the middle frequency ranges; fewer high and low notes. It often has a calming effect on people. (Hospital waiting rooms and police holding cells should play it at a very low volume. Playing it just above the hearing threshold is very effective. Blasting it loudly doesn't make people calmer.)
    • Pink noise has a linear distribution, with more lower frequencies than high frequencies. This often sounds like a heavy rain. Some people report that this really helps them sleep.
      Pink_noise_spectrum.svg
    • Brown noise (also called brownian noise) uses a really steep frequency distribution. Lots of low frequencies and almost no high frequencies. If pink noise kind of helps you sleep, then brown noise will knock you out cold.
      Brown_noise_spectrum.svg

    Build Your Own Sleep Machine

    While I knew about these different noise patterns, I never really gave them much thought. Until one day I noticed something. My better half and I sometimes watch streaming TV together. During the ad breaks, there would be an ad for a table-top pink noise and brown noise generator. I couldn't help but notice that every time it played a few seconds of brown noise in the commercial, she would yawn.

    Of course, I'm not the type of person who would shell out money to buy a brown noise speaker system. Instead, I used the Linux audio program "sox" to generate about 15 minutes of brown noise as a music file.

    My first attempt just sounded like static to me. So I did a little tweaking with the sox audio filters.

    My second attempt gave me a very distinct calming sensation. So I did a few more adjustments.

    I knew the third attempt worked because I woke up 30 minutes later, still sitting at the keyboard. Totally well rested. The mp3 had ended 15 minutes before I woke up.

    I added this song to my "favorites" playlist and now I use it every time I take a nap. Seriously:
    1. Lay down in a semi-dark room and hit play on "Brown".
    2. Try not to think for just two minutes. (For me, turning off all of the thoughts in my head is like trying to hold my breath. I can't do it for very long.) Don't quietly repeat to yourself "don't think". Don't think about the next things you need to do. Don't listen for the settling of the house, or anything else. Just focus on the brown noise.
    For me, I'm typically out cold after 1-3 minutes. The mp3 lasts for 15 minutes. I usually end up napping for 15-30 minutes, but occasionally for an entire hour. The sleep duration definitely varies per person. Test it on yourself when you have a quiet afternoon. Maybe set an alarm if you don't want to sleep too long.

    I even use this when traveling by plane. Right after the flight attendant gives the safety presentation, I put on the headphones and hit play . I'm usually asleep before liftoff and I wake up after we reach cruising altitude.

    Try it!

    I've packaged up the mp3 files as an 'album' and uploaded it to my music server. On the album, I have 15 minutes of white, green, and brown noise. I also have some chicken sounds on the album because those help you snap out of it when the song ends.


    Značky: #Network, #Programming, #Forensics